summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Add a knob to allow reclaim of the directory vnodes that are source ofkib2009-12-281-2/+10
| | | | | | | | | | the namecache records. The reclamation is not enabled by default because for typical workload it would make namecache unusable, but large nested directory tree easily puts any process that accesses filesystem into 1 second wait for vlru. Reported by: yar (long time ago) MFC after: 3 days
* Now that all the callers seem to be fixed, add KASSERTs to make sure VAPPENDtrasz2009-12-261-0/+2
| | | | is not being used improperly.
* VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm objectkib2009-12-211-4/+3
| | | | | | | | | | | | | flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
* Extend ddb(4) "show mount" command to print active string mount options.jh2009-11-191-0/+13
| | | | | | | | Note that only option names are printed, not values. Reviewed by: pjd Approved by: trasz (mentor) MFC after: 2 weeks
* Provide default implementation for VOP_ACCESS(9), so that filesystems whichtrasz2009-10-011-0/+3
| | | | | | | want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
* Use C99 initialization for struct filterops.rwatson2009-09-121-8/+21
| | | | | | Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
* In vfs_mark_atime(9), be resistent against reclaimed vnodes.kib2009-09-091-1/+5
| | | | | | | Assert that neccessary locks are taken, since vop might not be called. Tested by: pho MFC after: 3 days
* Call prison_check from vfs_suser rather than re-implementing it.jamie2009-07-021-2/+1
| | | | Approved by: re (kib), bz (mentor)
* Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Usekib2009-06-101-8/+39
| | | | | | | | | | | | | | | | | | | | | | | | | vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Remove the now invalid (and possibly unused) debug.mpsafevfsattilio2009-05-301-9/+0
| | | | | | | sysctl/tunable. Reviewed by: emaste Sponsored by: Sandvine Incorporated
* Add VOP_ACCESSX, which can be used to query for newly added V*trasz2009-05-301-0/+47
| | | | | | | | permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-13/+5
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-111-2/+2
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Replace v_dd vnode pointer with v_cache_dd pointer to struct namecachekan2009-03-291-0/+1
| | | | | | | | | | | | in directory vnodes. Allow namecache dotdot entry to be created pointing from child vnode to parent vnode if no existing links in opposite direction exist. Use direct link from parent to child for dotdot lookups otherwise. This restores more efficient dotdot caching in NFS filesystems which was lost when vnodes stoppped being type stable. Reviewed by: kib
* Change vfs_busy to wait until an outcome of pending unmountkan2009-03-021-5/+13
| | | | | | | | | | | | operation is known and to retry or fail accordingly to that outcome. This fixes the problem with namespace traversing programs failing with random ENOENT errors if someone just happened to try to unmount that same filesystem at the same time. Reported by: dhw Reviewed by: kib, attilio Sponsored by: Juniper Networks, Inc.
* Tweak the output of VOP_PRINT/vn_printf() some.jhb2009-02-061-1/+0
| | | | | | | | - Align the fifo output in fifo_print() with other vn_printf() output. - Remove the leading space from lockmgr_printinfo() so its output lines up in vn_printf(). - lockmgr_printinfo() now ends with a newline, so remove an extra newline from vn_printf().
* Add KASSERTs to make it easier to debug problems like the one fixedtrasz2009-02-061-0/+1
| | | | | | | | | in r188141. Reviewed by: kib,attilio Approved by: rwatson (mentor) Tested by: pho Sponsored by: FreeBSD Foundation
* Add more KTR_VFS logging point in order to have a more effective tracing.attilio2009-02-051-21/+64
| | | | | Reviewed by: brueffer, kib Tested by: Gianni Trematerra <giovanni D trematerra A gmail D com>
* Tweak the wording for vfs_mark_atime() since the I/O it is avoiding by notjhb2009-01-231-3/+3
| | | | | | | updating va_atime via VOP_SETATTR() isn't always synchronous. For some filesystems it is asynchronous. Suggested by: bde
* Push down Giant in the vlnru kproc main loop so that it is only acquiredjhb2009-01-231-11/+3
| | | | | | | | around calls to vlrureclaim() on non-MPSAFE filesystems. Specifically, vnlru no longer needs Giant for the common case of waking up and deciding there is nothing for it to do. MFC after: 2 weeks
* Fix a few style bogons.jhb2009-01-211-2/+1
| | | | Submitted by: bde
* Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:jhb2009-01-211-5/+2
| | | | | | | | | | | | VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature. Inspired by: ups Tested by: pho
* FFS puts the extended attributes blocks at the negative blocks for thekib2009-01-201-1/+1
| | | | | | | | | | | | | | | | | | vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data. Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already. Reported by: csjp Reviewed by: ups MFC after: 3 weeks
* 1) Fix a deadlock in the VFS:attilio2008-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | - threadA runs vfs_rel(mp1) - threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK() - threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps waiting for threadB to complete the unmount - threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting for the refcount to expire. Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the unmounter is waiting for mnt_ref to expire. The vfs_busy contenders got awake, fails, and if they retry the MNTK_REFEXPIRE won't allow them to sleep again. 2) Simplify significantly the code of vfs_mount_destroy() trimming unnecessary codes: - as long as any reference exited, it is no-more possible to have write-op (primarty and secondary) in progress. - it is no needed to drop and reacquire the mount lock. - filling the structures with dummy values is unuseful as long as it is going to be freed. Tested by: pho, Andrea Barberio <insomniac at slackware dot it> Discussed with: kib
* In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointerkib2008-11-291-0/+26
| | | | | | | | | | | | | | | | | | | to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.pjd2008-11-171-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
* Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.attilio2008-11-031-2/+0
| | | | | | | | | | | Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho
* Improve VFS locking:attilio2008-11-021-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-281-15/+15
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Style return statements in vn_pollrecord().kib2008-10-281-2/+2
|
* Protect check for v_pollinfo == NULL and assignment of the newly allocatedkib2008-10-281-14/+22
| | | | | | | | | vpollinfo with vnode interlock. Fully initialize vpollinfo before putting pointer to it into vp->v_pollinfo. Discussed with: dwhite Tested by: pho MFC after: 1 week
* In vfs_busy(), lockmgr() cannot legitimately sleep, because code checkedkib2008-10-201-1/+1
| | | | | | | | | | MNTK_UNMOUNT before, and mnt_mtx is used as interlock. vfs_busy() always tries to obtain a shared lock on mnt_lock, the other user is unmount who tries to drain it, setting MNTK_UNMOUNT before. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-101-8/+6
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-311-16/+10
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-281-4/+4
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() functionkib2008-08-281-5/+20
| | | | | | | | | | | | | | | | | to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month
* Remove worrying printf warning on bootup when processing vnodes whichcsjp2008-08-241-1/+1
| | | | | | | | | have NULL mount-points. This is the case for special vnodes, such as the one used in nameiinit() which is used for crossing mount points in lookup() to avoid lock ordering issues. MFC after: 2 weeks Discussed with: rwatson, kib
* Remove the use of lbolt from the VFS syncer.ed2008-07-301-9/+7
| | | | | | | | | | | | | | It seems we only use `lbolt' inside the VFS syncer and the TTY layer now. Because I'm planning to replace the TTY layer next month, there's no reason to keep `lbolt' if it's only used in a single thread inside the kernel. Because the syncer code wanted to wake up the syncer thread before the timeout, it called sleepq_remove(). Because we now just use a condvar(9) with a timeout value of `hz', we can wake it up using cv_broadcast() without waking up any unrelated threads. Reviewed by: phk
* Assert for exclusive vnode lock in vinactive(), vrecycle() and vgonel()pjd2008-07-271-3/+3
| | | | | | functions. Reviewed by: kib
* - Move vp test for beeing NULL under IGNORE_LOCK().pjd2008-07-271-10/+7
| | | | | | | | | - Check if panicstr isn't set, if it is ignore the lock. This helps to avoid confusion, because lockmgr is a no-op when panicstr isn't NULL, so asserting anything at this point doesn't make sense and can just race with other panic. Discussed with: kib
* - Disallow XFS mounting in write mode. The write support never worked reallyattilio2008-07-211-0/+2
| | | | | | | | | | | | and there is no need to maintain it. - Fix vn_get() in order to let it call vget(9) with a valid locking request. vget(9) returns the vnode locked in order to prevent recycling, but in this case internal XFS locks alredy prevent it from happening, so it is safe to drop the vnode lock before to return by vn_get(). - Add a VNASSERT() in vget(9) in order to catch malformed locking requests. Discussed with: kan, kib Tested by: Lothar Braun <lothar at lobraun dot de>
* Be more friendly for DDB pager.pjd2008-05-181-1/+6
| | | | Educated by: jhb's BSDCan presentation
* sync_vnode() has some messy code about locking in order to deal withattilio2008-05-041-39/+37
| | | | | | | | | | | mount fs needing Giant to be held when processing bufobjs. Use a different subqueue for pending workitems on filesystems requiring Giant. This simplifies the code notably and also reduces the number of Giant acquisitions (and the whole processing cost). Suggested by: jeff Reviewed by: kib Tested by: pho
* Implement 'show mount' command in DDB. Without argument, it prints shortpjd2008-04-261-0/+152
| | | | | | | info about all currently mounted file systems. When an address is given as an argument, prints detailed info about the given mount point. MFC after: 2 weeks
* Allow the vnode zone to return the unused memory. The vnode referencekib2008-04-241-2/+2
| | | | | | | | count is/shall be properly maintained for the long time, and VFS shall be safe against the vnode memory reclamation. Proposed by: jeff Tested by: pho
* Move the head of byte-level advisory lock list from thekib2008-04-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
* - Destroy the bo mtx when the vnode is destroyed.jeff2008-04-021-0/+1
|
* b_waiters cannot be adequately protected by the interlock because it isattilio2008-03-281-5/+1
| | | | | | | | | | | | | | | | dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib
* - Greatly simplify vget() by removing the guarantee that any newjeff2008-03-241-32/+18
| | | | | | | | | | references to a vnode with VI_OWEINACT set will force the vinactive() call. The kernel makes no guarantees about which reference was the last to close a file or when the actual inactive processing will happen. The previous code was designed to preserve existing semantics in the face of shared locks, however, this was unnecessary. Discussed with: mckusick
OpenPOWER on IntegriCloud