summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Do not flush buffers when the v_object of the passed vnode does notkib2013-10-091-0/+2
| | | | | | | | | | | | | | | | | | really belong to it. Such vnodes, with the pointers to other vnodes v_objects, are typically instantiated by the bypass filesystems. Invalidating mappings of other vnode pages and the pages is wrong, since reclamation of the upper vnode does not imply that lower vnode is reclaimed too. One of the consequences of the improper reclamation was destruction of the wired mappings of the lower vnode pages, triggering miscellaneous assertions in the VM system. Reported by: John Marshall <john.marshall@riverwillow.com.au> Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)
* When printing the vnode information from ddb, print the lengths of thekib2013-10-011-2/+5
| | | | | | | | dirty and clean buffer queues. Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)
* For vunref(), try to upgrade the vnode lock if the function was calledkib2013-09-291-2/+4
| | | | | | | | | | with the vnode shared-locked. If upgrade succeeded, the inactivation can be done immediately, instead of being postponed. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (glebius)
* Acquire a hold reference on the vnode when a knote is instantiated.kib2013-09-261-0/+2
| | | | | | | | | | | | Otherwise, knote keeps a pointer to a vnode which could become invalid any time. Reported by: many Tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (marius)
* In r114945 the line 'nmp = TAILQ_NEXT(mp, mnt_list);' was duplicated.pjd2013-08-171-6/+3
| | | | Instead of just removing the duplicate, convert the loop to TAILQ_FOREACH().
* When creation of the v_pollinfo raced and our instance of vpollinfokib2013-07-281-4/+11
| | | | | | | | | | | | | | must be destroyed, knlist_clear() and seldrain() calls could be avoided, since vpollinfo was not used. More, the knlist_clear() calling protocol requires the knlist locked, which is not true at the call site. Split the destruction into the helper destroy_vpollinfo_free(), and call it when raced, instead of destroy_vpollinfo(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Clear the vnode knotes before destroying vpollinfo.kib2013-07-171-0/+2
| | | | | | Reported and tested by: Patrick Lamaiziere <patfbsd@davenulle.org> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Be more generous when donating the current thread time to the owner ofkib2013-06-031-1/+1
| | | | | | | | | | | the vnode lock while iterating over the free vnode list. Instead of yielding, pause for 1 tick. The change is reported to help in some virtualized environments. Submitted by: Roger Pau Monn? <roger.pau@citrix.com> Discussed with: jilles Tested by: pho MFC after: 2 weeks
* - Convert the bufobj lock to rwlock.jeff2013-05-311-21/+11
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* - Add a new general purpose path-compressed radix trie which can be usedjeff2013-05-121-112/+55
| | | | | | | | | | | with any structure containing a uint64_t index. The tree code auto-generates type safe wrappers. - Eliminate the buf splay and replace it with pctrie. This is not only significantly faster with large files but also allows for the possibility of shared locking. Reviewed by: alc, attilio Sponsored by: EMC / Isilon Storage Division
* - Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). Thekib2013-05-111-7/+18
| | | | | | | | | | | | | | | | | | | | | | | null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Add option WITNESS_NO_VNODE to suppress printing LORs between VNODEmarcel2013-05-091-1/+1
| | | | | | | | | locks. To support this, VNODE locks are created with the LK_IS_VNODE flag. This flag is propagated down using the LO_IS_VNODE flag. Note that WITNESS still records the LOR. Only the printing and the optional entering into the kernel debugger is bypassed with the WITNESS_NO_VNODE option.
* Add missing vdrop() in error case.mdf2013-05-041-0/+1
| | | | | Submitted by: Fahad (mohd.fahadullah@isilon.com) MFC after: 1 week
* Allow the vnode to be unlocked for the weird case ofrmacklem2013-04-161-1/+1
| | | | | | | | | LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a usecount on a vnode during NFSv4 recovery from an expired lease. Reported and tested by: pho MFC after: 2 weeks
* Prepare to replace the buf splay with a trie:jeff2013-04-061-19/+9
| | | | | | | | | | | | | | | | - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-10/+10
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Add a trivial comment to record the proper commit log for r245407:kib2013-01-141-0/+1
| | | | | | | | | | | | Set the v_hash for a new vnode in the getnewvnode() to the value calculated based on the vnode structure address. Filesystems using vfs_hash_insert() override the v_hash using the standard formula of (inode_number + mnt_hashseed). For other filesystems, the initialization allows the vfs_hash_index() to provide useful hash too. Suggested, reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days
* diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.ckib2013-01-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | index 7c243b6..0bdaf36 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, #define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt) #define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt) +static int vnsz2log; /* * Initialize the vnode management data structures. @@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, static void vntblinit(void *dummy __unused) { + u_int i; int physvnodes, virtvnodes; /* @@ -332,6 +334,9 @@ vntblinit(void *dummy __unused) syncer_maxdelay = syncer_mask + 1; mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF); cv_init(&sync_wakeup, "syncer"); + for (i = 1; i <= sizeof(struct vnode); i <<= 1) + vnsz2log++; + vnsz2log--; } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL); @@ -1067,6 +1072,14 @@ alloc: } rangelock_init(&vp->v_rl); + /* + * For the filesystems which do not use vfs_hash_insert(), + * still initialize v_hash to have vfs_hash_index() useful. + * E.g., nullfs uses vfs_hash_index() on the lower vnode for + * its own hashing. + */ + vp->v_hash = (uintptr_t)vp >> vnsz2log; + *vpp = vp; return (0); }
* Fixup r244240: mp_ncpus will be 1 also in the !SMP and smp_disabled=1attilio2012-12-261-8/+1
| | | | | | | | | | | case. There is no point in optimizing further the code and use a TRUE litteral for a path that does heavyweight stuff anyway (like lock acq), at the price of obfuscated code. Use the appropriate check where necessary and remove a macro. Sponsored by: EMC / Isilon storage division MFC after: 3 days
* Fixup r218424: uio_yield() was scaling directly to userland priority.attilio2012-12-211-4/+4
| | | | | | | | | | | | | | | When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week
* When mnt_vnode_next_active iterator cannot lock the next vnode andkib2012-12-151-55/+51
| | | | | | | | | | | | | | | | | | | yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Do not yield while owning a mutex. The Giant reacquire in thekib2012-12-101-16/+18
| | | | | | | | | | | | | | | | | kern_yield() is problematic than. The owned mutex is the mount interlock, and it is in fact not needed to guarantee the stability of the mount list of active vnodes, so fix the the issue by only taking the mount interlock for MNT_REF and MNT_REL operations. While there, augment the unconditional yield by some amount of spinning [1]. Reported and tested by: pho Reviewed by: attilio Submitted by: attilio [1] MFC after: 3 days
* The vnode_free_list_mtx is required unconditionally when iteratingkib2012-12-031-4/+28
| | | | | | | | | | | | | | | | | | | | | over the active list. The mount interlock is not enough to guarantee the validity of the tailq link pointers. The __mnt_vnode_next_active() and __mnt_vnode_first_active() active lists iterators helper functions did not provided the neccessary stability for the list, allowing the iterators to pick garbage. This was uncovered after the r243599 made the active list iterators non-nop. Since a vnode interlock is before the vnode_free_list_mtx, obtain the vnode ilock in the non-blocking manner when under vnode_free_list_mtx, and restart iteration after the yield if the lock attempt failed. Assert that a vnode found on the list is active, and assert that the helpers return the vnode with interlock owned. Reported and tested by: pho MFC after: 1 week
* Take first active vnode correctly.davidxu2012-11-271-1/+1
| | | | | Reviewed by: kib MFC after: 3 days
* assert_vop_locked: make the assertion race-free and more efficientavg2012-11-241-3/+6
| | | | | | this is really a minor improvement for the sake of correctness MFC after: 6 days
* remove vop_lookup_pre and vop_lookup_postavg2012-11-221-10/+0
| | | | | Suggested by: kib MFC after: 5 days
* insmntque() is always called with the lock held in exclusive mode,attilio2012-11-191-16/+8
| | | | | | | | | | then: - assume the lock is held in exclusive mode and remove a moot check about the lock acquisition. - in the destructor remove !MPSAFE specific chunk. Reviewed by: kib MFC after: 2 weeks
* assert_vop_locked should treat LK_EXCLOTHER as the not locked caseavg2012-11-191-1/+2
| | | | | | | | ... from a perspective of the current thread. Spotted by: mjg Discussed with: kib MFC after: 18 days
* vnode_if: fix locking protocol description for lookup and cachedlookupavg2012-11-191-24/+0
| | | | | | | | | Also remove the checks from vop_lookup_pre and vop_lookup_post, which are now completely redundant (before this change they were partially redundant). Discussed with: kib MFC after: 10 days
* Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.attilio2012-11-091-1/+0
| | | | | Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
* A clarification to the behaviour of the active vnode list managementkib2012-11-051-0/+3
| | | | | | | regarding the vnode page cleaning. In collaboration with: pho MFC after: 1 week
* Add decoding of the missed MNT_KERN_ flags to ddb "show mount" command.kib2012-11-041-0/+5
| | | | MFC after: 3 weeks
* Add decoding of the missed VI_ and VV_ flags to ddb "show vnode" command.kib2012-11-041-3/+9
| | | | MFC after: 3 days
* Order the enumeration of the MNT_ flags to be the same as the order ofkib2012-11-041-2/+2
| | | | | | their definitions. MFC after: 3 days
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-51/+11
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Add a KPI to allow to reserve some amount of space in the numvnodeskib2012-10-141-24/+72
| | | | | | | | | | | | | counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
* Remove all the checks on curthread != NULL with the exception of some MDattilio2012-09-131-1/+0
| | | | | | | | | | | trap checks (eg. printtrap()). Generally this check is not needed anymore, as there is not a legitimate case where curthread != NULL, after pcpu 0 area has been properly initialized. Reviewed by: bde, jhb MFC after: 1 week
* Add a facility for vgone() to inform the set of subscribed mountskib2012-09-091-0/+55
| | | | | | | | | | | | | | | | | about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks
* Provide some compat32 shims for sysctl vfs.conflist. It is requiredkib2012-08-221-16/+49
| | | | | | | | | for getvfsbyname(3) operation when called from 32bit process, and getvfsbyname(3) is used by recent bsdtar import. Reported by: many Tested by: David Naylor <naylor.b.david@gmail.com> MFC after: 5 days
* free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOGavg2012-06-031-4/+2
| | | | | | Those calls are useful with hardware watchdog drivers too. MFC after: 3 weeks
* Add a rangelock implementation, intended to be used to range-lockingkib2012-05-301-0/+2
| | | | | | | | | the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month
* Remove unused thread argument to vrecycle().trasz2012-04-231-1/+1
| | | | Reviewed by: kib
* Remove unused thread argument from vtruncbuf().trasz2012-04-231-2/+1
| | | | Reviewed by: kib
* This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loopsmckusick2012-04-201-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* This change creates a new list of active vnodes associated withmckusick2012-04-201-10/+173
| | | | | | | | | | | | | | | | | | | | a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Delete a no longer useful VNASSERT missed during changes in 234400.mckusick2012-04-181-2/+0
| | | | Suggested by: kib
* Fix a memory leak of M_VNODE_MARKER introduced in 234386.mckusick2012-04-181-1/+1
| | | | Found by: Peter Holm
* Drop export of vdestroy() function from kern/vfs_subr.c as it ismckusick2012-04-171-102/+85
| | | | | | | | | | | | | | | used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-171-18/+91
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
OpenPOWER on IntegriCloud