summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Change vfs_busy to wait until an outcome of pending unmountkan2009-03-021-5/+13
| | | | | | | | | | | | operation is known and to retry or fail accordingly to that outcome. This fixes the problem with namespace traversing programs failing with random ENOENT errors if someone just happened to try to unmount that same filesystem at the same time. Reported by: dhw Reviewed by: kib, attilio Sponsored by: Juniper Networks, Inc.
* Tweak the output of VOP_PRINT/vn_printf() some.jhb2009-02-061-1/+0
| | | | | | | | - Align the fifo output in fifo_print() with other vn_printf() output. - Remove the leading space from lockmgr_printinfo() so its output lines up in vn_printf(). - lockmgr_printinfo() now ends with a newline, so remove an extra newline from vn_printf().
* Add KASSERTs to make it easier to debug problems like the one fixedtrasz2009-02-061-0/+1
| | | | | | | | | in r188141. Reviewed by: kib,attilio Approved by: rwatson (mentor) Tested by: pho Sponsored by: FreeBSD Foundation
* Add more KTR_VFS logging point in order to have a more effective tracing.attilio2009-02-051-21/+64
| | | | | Reviewed by: brueffer, kib Tested by: Gianni Trematerra <giovanni D trematerra A gmail D com>
* Tweak the wording for vfs_mark_atime() since the I/O it is avoiding by notjhb2009-01-231-3/+3
| | | | | | | updating va_atime via VOP_SETATTR() isn't always synchronous. For some filesystems it is asynchronous. Suggested by: bde
* Push down Giant in the vlnru kproc main loop so that it is only acquiredjhb2009-01-231-11/+3
| | | | | | | | around calls to vlrureclaim() on non-MPSAFE filesystems. Specifically, vnlru no longer needs Giant for the common case of waking up and deciding there is nothing for it to do. MFC after: 2 weeks
* Fix a few style bogons.jhb2009-01-211-2/+1
| | | | Submitted by: bde
* Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:jhb2009-01-211-5/+2
| | | | | | | | | | | | VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature. Inspired by: ups Tested by: pho
* FFS puts the extended attributes blocks at the negative blocks for thekib2009-01-201-1/+1
| | | | | | | | | | | | | | | | | | vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data. Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already. Reported by: csjp Reviewed by: ups MFC after: 3 weeks
* 1) Fix a deadlock in the VFS:attilio2008-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | - threadA runs vfs_rel(mp1) - threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK() - threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps waiting for threadB to complete the unmount - threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting for the refcount to expire. Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the unmounter is waiting for mnt_ref to expire. The vfs_busy contenders got awake, fails, and if they retry the MNTK_REFEXPIRE won't allow them to sleep again. 2) Simplify significantly the code of vfs_mount_destroy() trimming unnecessary codes: - as long as any reference exited, it is no-more possible to have write-op (primarty and secondary) in progress. - it is no needed to drop and reacquire the mount lock. - filling the structures with dummy values is unuseful as long as it is going to be freed. Tested by: pho, Andrea Barberio <insomniac at slackware dot it> Discussed with: kib
* In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointerkib2008-11-291-0/+26
| | | | | | | | | | | | | | | | | | | to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.pjd2008-11-171-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
* Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.attilio2008-11-031-2/+0
| | | | | | | | | | | Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho
* Improve VFS locking:attilio2008-11-021-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-281-15/+15
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Style return statements in vn_pollrecord().kib2008-10-281-2/+2
|
* Protect check for v_pollinfo == NULL and assignment of the newly allocatedkib2008-10-281-14/+22
| | | | | | | | | vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week
* In vfs_busy(), lockmgr() cannot legitimately sleep, because code checkedkib2008-10-201-1/+1
| | | | | | | | | | MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always tries to obtain a shared lock on mnt_lock, the other user is unmount who tries to drain it, setting MNTK_UNMOUNT before. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-101-8/+6
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-311-16/+10
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-281-4/+4
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() functionkib2008-08-281-5/+20
| | | | | | | | | | | | | | | | | to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month
* Remove worrying printf warning on bootup when processing vnodes whichcsjp2008-08-241-1/+1
| | | | | | | | | have NULL mount-points. This is the case for special vnodes, such as the one used in nameiinit() which is used for crossing mount points in lookup() to avoid lock ordering issues. MFC after: 2 weeks Discussed with: rwatson, kib
* Remove the use of lbolt from the VFS syncer.ed2008-07-301-9/+7
| | | | | | | | | | | | | | It seems we only use `lbolt' inside the VFS syncer and the TTY layer now. Because I'm planning to replace the TTY layer next month, there's no reason to keep `lbolt' if it's only used in a single thread inside the kernel. Because the syncer code wanted to wake up the syncer thread before the timeout, it called sleepq_remove(). Because we now just use a condvar(9) with a timeout value of `hz', we can wake it up using cv_broadcast() without waking up any unrelated threads. Reviewed by: phk
* Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel()pjd2008-07-271-3/+3
| | | | | | functions. Reviewed by: kib
* - Move vp test for beeing NULL under IGNORE_LOCK().pjd2008-07-271-10/+7
| | | | | | | | | - Check if panicstr isn't set, if it is ignore the lock. This helps to avoid confusion, because lockmgr is a no-op when panicstr isn't NULL, so asserting anything at this point doesn't make sense and can just race with other panic. Discussed with: kib
* - Disallow XFS mounting in write mode. The write support never worked reallyattilio2008-07-211-0/+2
| | | | | | | | | | | | and there is no need to maintain it. - Fix vn_get() in order to let it call vget(9) with a valid locking request. vget(9) returns the vnode locked in order to prevent recycling, but in this case internal XFS locks alredy prevent it from happening, so it is safe to drop the vnode lock before to return by vn_get(). - Add a VNASSERT() in vget(9) in order to catch malformed locking requests. Discussed with: kan, kib Tested by: Lothar Braun <lothar at lobraun dot de>
* Be more friendly for DDB pager.pjd2008-05-181-1/+6
| | | | Educated by: jhb's BSDCan presentation
* sync_vnode() has some messy code about locking in order to deal withattilio2008-05-041-39/+37
| | | | | | | | | | | mount fs needing Giant to be held when processing bufobjs. Use a different subqueue for pending workitems on filesystems requiring Giant. This simplifies the code notably and also reduces the number of Giant acquisitions (and the whole processing cost). Suggested by: jeff Reviewed by: kib Tested by: pho
* Implement 'show mount' command in DDB. Without argument, it prints shortpjd2008-04-261-0/+152
| | | | | | | info about all currently mounted file systems. When an address is given as an argument, prints detailed info about the given mount point. MFC after: 2 weeks
* Allow the vnode zone to return the unused memory. The vnode referencekib2008-04-241-2/+2
| | | | | | | | count is/shall be properly maintained for the long time, and VFS shall be safe against the vnode memory reclamation. Proposed by: jeff Tested by: pho
* Move the head of byte-level advisory lock list from thekib2008-04-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
* - Destroy the bo mtx when the vnode is destroyed.jeff2008-04-021-0/+1
|
* b_waiters cannot be adequately protected by the interlock because it isattilio2008-03-281-5/+1
| | | | | | | | | | | | | | | | dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib
* - Greatly simplify vget() by removing the guarantee that any newjeff2008-03-241-32/+18
| | | | | | | | | | references to a vnode with VI_OWEINACT set will force the vinactive() call. The kernel makes no guarantees about which reference was the last to close a file or when the actual inactive processing will happen. The previous code was designed to preserve existing semantics in the face of shared locks, however, this was unnecessary. Discussed with: mckusick
* - Only return 1 from sync_vnode() in cases where the vnode is stilljeff2008-03-231-1/+1
| | | | | | | at the head of the sync list. This prevents sched_sync() from re-queueing a vnode which may have been freed already. Discussed with: kib
* - Pass BO_MTX(bo) to lockmgr in vtruncbuf, we don't own the vnodejeff2008-03-231-1/+1
| | | | | | interlock here anymore. Reported by: kris
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-25/+29
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-3/+4
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* - Handle buffer lock waiters count directly in the buffer cache insteadattilio2008-03-011-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
* Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it isattilio2008-02-251-13/+11
| | | | | | | | | always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
* Conver all explicit instances to VOP_ISLOCKED(arg, NULL) intoattilio2008-02-081-5/+6
| | | | | | | | VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib
* Cleanup lockmgr interface and exported KPI:attilio2008-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
* - Introduce the function lockmgr_recursed() which returns true if theattilio2008-01-191-1/+1
| | | | | | | | | | | | | | | | | | | lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-20/+19
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-6/+6
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* In "show lockedvnods" DDB command, use db_printf() rather than printf()rwatson2007-12-281-1/+1
| | | | | | | | | so that the results end up in the DDB output stream rather than the console output stream. This should likely also be done for the vprint() function it calls. MFC after: 3 months
* As LK_EXCLUPGRADE is used in conjuction with LK_NOWAIT, LK_UPGRADE becamesattilio2007-12-271-1/+1
| | | | | | | | | | equivalent with this and so operate the switch. That call is the only one remaining LK_EXCLUPGRADE consumer and removing it will prepare the ground for LK_EXCLUPGRADE axing and further lockmgr improvements. Discussed with: jeff, ups
* Add a new 'why' argument to kdb_enter(), and a set of constants to userwatson2007-12-251-2/+2
| | | | | | | | | for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.
* Use curthread instead of the FIRST_THREAD_IN_PROC for vnlru and syncer,kib2007-12-051-15/+42
| | | | | | | | | | | | | | | | | when applicable. Aquire Giant slightly later for vnlru. In the syncer, aquire the Giant only when a vnode belongs to the non-MPsafe fs. In both speedup_syncer() and syncer_shutdown(), remove the syncer thread from the lbolt sleep queue after the syncer state is modified, not before. Herded by: attilio Tested by: Peter Holm Reviewed by: ups MFC after: 1 week
OpenPOWER on IntegriCloud