| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
done as part of r292895 on stable/10 as that change causes hangs with
ZFS and the cause on at least amd64 so far not understood.
Discussed with: kib
For further information see:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-February/084045.html
PR: 207281
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
| |
introduced with r293742, just like it was hidden before that commit.
This is a direct commit to 10-STABLE; this special case is not needed
in 11-CURRENT, because devfs supports forced unmounts there. The forced
unmount could be MFC-ed, but there are some LORs at shutdown, and I have
a weird feelings about it.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
| |
Make vfs_unmountall() unmount /dev after /, not before. The only
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This MFC includes changes to better manage the vnode freelist
and to streamline the allocation and freeing of vnodes.
Note that to maintain the KPI the VI_AGE flag is left defined
in sys/vnode.h though its use is dropped as described in 291380.
To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable
remains though it no longer has any effect as described in 291244.
MFC of 291244:
Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop(), which is the
function that decides whether to leave the vnode on the list. Note
that dirty page write-out in vinactive() is asynchronous.
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC of 291380:
Remove VI_AGE vnode iflag, it is unused.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC of 291459:
For performance reasons, it is useful to have a single string used as
the name of a filesystem when setting it as the first parameter to the
getnewvnode() function. Most filesystems call getnewvnode from just one
place so can use a literal string as the first parameter. However, NFS
calls getnewvnode from two places, so we create a global constant string
that can be used by the two instances. This change also collapses two
instances of getnewvnode() in the UFS filesystem to a single call.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291460:
As the kernel allocates and frees vnodes, it fully initializes them
on every allocation and fully releases them on every free. These
are not trivial costs: it starts by zeroing a large structure then
initializes a mutex, a lock manager lock, an rw lock, four lists,
and six pointers. And looking at vfs.vnodes_created, these operations
are being done millions of times an hour on a busy machine.
As a performance optimization, this code update uses the uma_init
and uma_fini routines to do these initializations and cleanups only
as the vnodes enter and leave the vnode_zone. With this change the
initializations are only done kern.maxvnodes times at system startup
and then only rarely again. The frees are done only if the vnode_zone
shrinks which never happens in practice. For those curious about the
avoided work, look at the vnode_init() and vnode_fini() functions in
kern/vfs_subr.c to see the code that has been removed from the main
vnode allocation/free path.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291671:
We need to zero out the union of pointers in a freed vnode structure.
Fix from: Mateusz Guzik
Tested by: Jason Unovitch
MFC of 291743:
We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).
Reported by; Rick Macklem
Fix from: Mateusz Guzik
|
|
|
|
|
| |
Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop().
|
|
|
|
|
|
| |
Don't take devmtx unnecessarily in vn_isdisk.
Sponsored by: Multiplay
|
|
|
|
|
|
|
| |
After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
| |
Mark vgonel() as static.
Sponsored by: The FreeBSD Foundation
|
|
|
|
| |
Fix argument ordering in vn_printf().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
More accurately collect name-cache statistics in sysctl functions
sysctl_debug_hashstat_nchash() and sysctl_debug_hashstat_rawnchash().
These changes are in preparation for allowing changes in the size
of the vnode hash tables driven by increases and decreases in the
maximum number of vnodes in the system.
Reviewed by: kib@
Phabric: D2265
MFC of 287497:
Track changes to kern.maxvnodes and appropriately increase or decrease
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.
Reviewed by: kib
Tested by: Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
|
|
|
|
| |
Do not allow creation of the dirty buffers for the dead buffer objects.
|
|
|
|
|
| |
Keep a vnode which is freed but still owing inactivation, on the active list.
This closes a race where such vnode is not msync-ed until reboot.
|
|
|
|
|
|
|
|
|
| |
Prevent dounmount() from acting on the freed (although type-stable)
memory by changing the interface to require the mount point to be
referenced.
MFC r283629:
Add missed {}.
|
|
|
|
|
|
|
|
|
|
|
| |
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
|
|
|
|
|
|
|
|
| |
Add two new counters for vnode life cycle events:
- vfs.recycles counts the number of vnodes forcefully recycled to avoid
exceeding kern.maxvnodes.
- vfs.vnodes_created counts the number of vnodes created by successful
calls to getnewvnode().
|
|
|
|
| |
Change the default VFS timestamp precision from seconds to microseconds.
|
|
|
|
|
|
|
| |
The VNASSERT in vflush() FORCECLOSE case is trying to panic early to
prevent errors from yanking devices out from under filesystems. Only
care about special vnodes on devfs, special nodes on other kinds of
filesystems do not have special properties.
|
|
|
|
|
|
|
|
| |
Add the mnt_lockref field to the ddb(4) 'show mount' command
Differential Revision: https://reviews.freebsd.org/D1688
Submitted by: Conrad Meyer <conrad.meyer@isilon.com>
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
| |
Put the buffer cleanup code after inactivation.
|
|
|
|
|
|
|
| |
Add functions syncer_suspend() and syncer_resume().
MFC r275637:
Remove local variable for real.
|
|
|
|
| |
Remove Giant acquisition from the mount and unmount pathes.
|
|
|
|
| |
Remove one-time use macros which check for the vnode lifecycle.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement simple direct-mapped cache for popular filesystem identifiers to
avoid congestion on global mountlist_mtx mutex in vfs_busyfs(), while
traversing through the list of mount points.
This change significantly improves NFS server scalability, since it had
to do this translation for every request, and the global lock becomes quite
congested.
This code is more optimized for relatively small number of mount points.
On systems with hundreds of active mount points this simple cache may have
many collisions. But the original traversal code in that case should also
behave much worse, so we are not loosing much.
|
|
|
|
|
|
| |
Remove unneeded mountlist_mtx acquisition from sync_fsync().
All struct mount fields accessed by sync_fsync() are protected by MNT_MTX.
|
|
|
|
|
|
|
|
| |
Use atomics to modify numvnodes variable.
This allows to mostly avoid lock usage in getnewvnode_[drop_]reserve(),
that reduces number of global vnode_free_list_mtx mutex acquisitions
from 4 to 2 per NFS request on ZFS, improving SMP scalability.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
really belong to it. Such vnodes, with the pointers to other vnodes
v_objects, are typically instantiated by the bypass filesystems.
Invalidating mappings of other vnode pages and the pages is wrong,
since reclamation of the upper vnode does not imply that lower vnode
is reclaimed too.
One of the consequences of the improper reclamation was destruction of
the wired mappings of the lower vnode pages, triggering miscellaneous
assertions in the VM system.
Reported by: John Marshall <john.marshall@riverwillow.com.au>
Tested by: John Marshall <john.marshall@riverwillow.com.au>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
|
|
|
|
|
|
|
|
| |
dirty and clean buffer queues.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
| |
with the vnode shared-locked. If upgrade succeeded, the inactivation
can be done immediately, instead of being postponed.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (glebius)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, knote keeps a pointer to a vnode which could become invalid
any time.
Reported by: many
Tested by: Patrick Lamaiziere <patfbsd@davenulle.org>
Discussed with: jmg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (marius)
|
|
|
|
| |
Instead of just removing the duplicate, convert the loop to TAILQ_FOREACH().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
must be destroyed, knlist_clear() and seldrain() calls could be
avoided, since vpollinfo was not used. More, the knlist_clear()
calling protocol requires the knlist locked, which is not true at the
call site.
Split the destruction into the helper destroy_vpollinfo_free(), and
call it when raced, instead of destroy_vpollinfo().
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
|
|
|
|
|
|
| |
Reported and tested by: Patrick Lamaiziere <patfbsd@davenulle.org>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
| |
the vnode lock while iterating over the free vnode list. Instead of
yielding, pause for 1 tick. The change is reported to help in some
virtualized environments.
Submitted by: Roger Pau Monn? <roger.pau@citrix.com>
Discussed with: jilles
Tested by: pho
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf
|
|
|
|
|
|
|
|
|
|
|
| |
with any structure containing a uint64_t index. The tree code
auto-generates type safe wrappers.
- Eliminate the buf splay and replace it with pctrie. This is not only
significantly faster with large files but also allows for the possibility
of shared locking.
Reviewed by: alc, attilio
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
null_hashget() obtains the reference on the nullfs vnode, which must
be dropped.
- Fix a wart which existed from the introduction of the nullfs
caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
It should be innocent, but now it is also formally safe. Inform the
nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
nullfs inode.
- Add a callback to the upper filesystems for the lower vnode
unlinking. When inactivating a nullfs vnode, check if the lower
vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
on the lower vnode, and reclaim upper vnode if so. This allows
nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
excessive caching.
Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
| |
locks. To support this, VNODE locks are created with the LK_IS_VNODE
flag. This flag is propagated down using the LO_IS_VNODE flag.
Note that WITNESS still records the LOR. Only the printing and the
optional entering into the kernel debugger is bypassed with the
WITNESS_NO_VNODE option.
|
|
|
|
|
| |
Submitted by: Fahad (mohd.fahadullah@isilon.com)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
LK_EXCLOTHER. LK_EXCLOTHER is only used to acquire a
usecount on a vnode during NFSv4 recovery from an
expired lease.
Reported and tested by: pho
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
No consumers need to find them there and it complicates the tree.
These flags are all FFS specific and could be moved out of the buf
cache.
- Use pbgetvp() and pbrelvp() to associate the background and journal
bufs with the vp. Not only is this much cheaper it makes more sense
for these transient bufs.
- Fix the assertions in pbget* and pbrel*. It's not safe to check list
pointers which were never initialized. Use the BX flags instead. We
also check B_PAGING in reassignbuf() so this should cover all cases.
Discussed with: kib, mckusick, attilio
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
| |
their "write" versions.
Sponsored by: EMC / Isilon storage division
|
|
|
|
|
|
|
|
| |
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
|
|
|
|
|
|
|
|
|
|
|
|
| |
Set the v_hash for a new vnode in the getnewvnode() to the value
calculated based on the vnode structure address. Filesystems using
vfs_hash_insert() override the v_hash using the standard formula of
(inode_number + mnt_hashseed). For other filesystems, the
initialization allows the vfs_hash_index() to provide useful hash too.
Suggested, reviewed and tested by: peter
Sponsored by: The FreeBSD Foundation
MFC after: 5 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
index 7c243b6..0bdaf36 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW,
#define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt)
#define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt)
+static int vnsz2log;
/*
* Initialize the vnode management data structures.
@@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW,
static void
vntblinit(void *dummy __unused)
{
+ u_int i;
int physvnodes, virtvnodes;
/*
@@ -332,6 +334,9 @@ vntblinit(void *dummy __unused)
syncer_maxdelay = syncer_mask + 1;
mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF);
cv_init(&sync_wakeup, "syncer");
+ for (i = 1; i <= sizeof(struct vnode); i <<= 1)
+ vnsz2log++;
+ vnsz2log--;
}
SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL);
@@ -1067,6 +1072,14 @@ alloc:
}
rangelock_init(&vp->v_rl);
+ /*
+ * For the filesystems which do not use vfs_hash_insert(),
+ * still initialize v_hash to have vfs_hash_index() useful.
+ * E.g., nullfs uses vfs_hash_index() on the lower vnode for
+ * its own hashing.
+ */
+ vp->v_hash = (uintptr_t)vp >> vnsz2log;
+
*vpp = vp;
return (0);
}
|
|
|
|
|
|
|
|
|
|
|
| |
case. There is no point in optimizing further the code and use a TRUE
litteral for a path that does heavyweight stuff anyway (like lock acq),
at the price of obfuscated code.
Use the appropriate check where necessary and remove a macro.
Sponsored by: EMC / Isilon storage division
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When kern_yield() was introduced with the possibility to specify
a new priority, the behaviour changed by not lowering priority at all
in the consumers, making the yielding mechanism highly ineffective for
high priority kthreads like bufdaemon, syncer, vlrudaemon, etc.
There are no evidences that consumers could bear with such change in
semantic and this situation could finally lead to bugs similar to the
ones fixed in r244240.
Re-specify userland pri for kthreads involved.
Tested by: pho
Reviewed by: kib, mdf
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yields, specify the user priority for the yield. Otherwise, a
higher-priority (kernel) thread could fall into the priority-inversion
with the thread owning the mutex lock.
On single-processor machines or UP kernels, do not loop adaptively
when the next vnode cannot be locked, instead yield unconditionally.
Restructure the iteration initializer and the iterator to remove code
duplication. Put the code to fetch and lock a vnode next to the
current marker, into the mnt_vnode_next_active() function, and use it
instead of repeating the loop.
Reported by: hrs, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kern_yield() is problematic than.
The owned mutex is the mount interlock, and it is in fact not needed
to guarantee the stability of the mount list of active vnodes, so fix
the the issue by only taking the mount interlock for MNT_REF and
MNT_REL operations.
While there, augment the unconditional yield by some amount of
spinning [1].
Reported and tested by: pho
Reviewed by: attilio
Submitted by: attilio [1]
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over the active list. The mount interlock is not enough to guarantee
the validity of the tailq link pointers. The __mnt_vnode_next_active()
and __mnt_vnode_first_active() active lists iterators helper functions
did not provided the neccessary stability for the list, allowing the
iterators to pick garbage.
This was uncovered after the r243599 made the active list iterators
non-nop.
Since a vnode interlock is before the vnode_free_list_mtx, obtain the
vnode ilock in the non-blocking manner when under vnode_free_list_mtx,
and restart iteration after the yield if the lock attempt failed.
Assert that a vnode found on the list is active, and assert that the
helpers return the vnode with interlock owned.
Reported and tested by: pho
MFC after: 1 week
|
|
|
|
|
| |
Reviewed by: kib
MFC after: 3 days
|