| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This MFC includes changes to better manage the vnode freelist
and to streamline the allocation and freeing of vnodes.
Note that to maintain the KPI the VI_AGE flag is left defined
in sys/vnode.h though its use is dropped as described in 291380.
To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable
remains though it no longer has any effect as described in 291244.
MFC of 291244:
Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop(), which is the
function that decides whether to leave the vnode on the list. Note
that dirty page write-out in vinactive() is asynchronous.
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC of 291380:
Remove VI_AGE vnode iflag, it is unused.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC of 291459:
For performance reasons, it is useful to have a single string used as
the name of a filesystem when setting it as the first parameter to the
getnewvnode() function. Most filesystems call getnewvnode from just one
place so can use a literal string as the first parameter. However, NFS
calls getnewvnode from two places, so we create a global constant string
that can be used by the two instances. This change also collapses two
instances of getnewvnode() in the UFS filesystem to a single call.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291460:
As the kernel allocates and frees vnodes, it fully initializes them
on every allocation and fully releases them on every free. These
are not trivial costs: it starts by zeroing a large structure then
initializes a mutex, a lock manager lock, an rw lock, four lists,
and six pointers. And looking at vfs.vnodes_created, these operations
are being done millions of times an hour on a busy machine.
As a performance optimization, this code update uses the uma_init
and uma_fini routines to do these initializations and cleanups only
as the vnodes enter and leave the vnode_zone. With this change the
initializations are only done kern.maxvnodes times at system startup
and then only rarely again. The frees are done only if the vnode_zone
shrinks which never happens in practice. For those curious about the
avoided work, look at the vnode_init() and vnode_fini() functions in
kern/vfs_subr.c to see the code that has been removed from the main
vnode allocation/free path.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291671:
We need to zero out the union of pointers in a freed vnode structure.
Fix from: Mateusz Guzik
Tested by: Jason Unovitch
MFC of 291743:
We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).
Reported by; Rick Macklem
Fix from: Mateusz Guzik
|
|
|
|
|
|
|
|
|
| |
Handle errors from background write of the cylinder group blocks.
MFC r284927:
Simplify code.
Approved by: re (gjb)
|
|
|
|
|
| |
Keep a vnode which is freed but still owing inactivation, on the active list.
This closes a race where such vnode is not msync-ed until reboot.
|
|
|
|
| |
Remove several write-only variables.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Limit the number of cylinder groups that will be searched when
trying to build a cluster. The limit is tunable using the sysctl
vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups
per block allocation. It was previously limited to the number of
cylinder groups in the filesystem per block allocation. When there
were no clusters of the needed size left, it repeatedly searched
the whole filesystem for a non-existent cluster on every block
allocation. The result was very slow filesystem allocation with
100% CPU utilization. The old behavior can be had by setting
vfs.ffs.maxclustersearch to a huge number (1,000,000).
This change affects only the layout policy routines so is not able
to interfere with the integrity of the filesystem.
Reported by: Dmitry Sivachenko (demon@)
Tested by: Dmitry Sivachenko (demon@)
|
|
|
|
|
|
|
|
|
|
|
| |
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
|
|
|
|
|
|
|
| |
Fix the hand after the immediate reboot after the init binary is unlinked.
MFC r280763:
Fix build (with gcc).
|
|
|
|
| |
Correct the test for condition to suspend UFS filesystem during unmount.
|
|
|
|
|
|
|
| |
Add helper helper vfs_write_suspend_umnt().
Fix the bug in the FFS unmount, when suspension failed, the ufs
extattrs were not reinitialized.
|
|
|
|
|
|
|
|
|
|
| |
ufs: small formatting fixes.
Cleanup some extra space.
Use of tabs vs. spaces.
No functional change.
Reviewed by: mckusick
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This set of changes puts in place the infrastructure to allow soft
updates to be multi-threaded. It introduces no functional changes
from its current operation.
MFC of 256860:
Allow kernels without options SOFTUPDATES to build. This should fix the
embedded tinderboxes.
Reviewed by: emaste
MFC of 256845:
Fix build problem on ARM (which defaults to building without soft updates).
Reported by: Tinderbox
Sponsored by: Netflix
MFC of 256817:
Restructuring of the soft updates code to set it up so that the
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256812:
Fourth of several cleanups to soft dependency implementation.
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256808:
Third of several cleanups to soft dependency implementation.
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256803:
Second of several cleanups to soft dependency implementation.
Delete two unused functions in ffs_sofdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256801:
First of several cleanups to soft dependency implementation.
Convert three functions exported from ffs_softdep.c to static
functions as they are not used outside of ffs_softdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vfs_busy(mp);
vfs_write_suspend(mp);
which are problematic if other thread starts unmount between two
calls. The unmount starts a write, while vfs_write_suspend() drain
writers. On the other hand, unmount drains busy references, causing
the deadlock.
Add a flag argument to vfs_write_suspend and require the callers of it
to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the
mount path, i.e. the covered vnode is not locked. The suspension is
not attempted if VS_SKIP_UNMOUNT is specified and unmount is in
progress.
Reported and tested by: Andreas Longwitz <longwitz@incore.de>
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
|
|
|
|
|
|
|
| |
Cleanup the incomplete revert.
Reported by: bde
MFC after: 4 weeks
|
|
|
|
|
|
|
|
|
|
| |
Revert the simplification of the i_gen calculation.
It is still a good idea to avoid zero values and for the case
of old filesystems there is probably no advantage in using
the complete 32 bits anyways.
Discussed with: bde
MFC after: 4 weeks
|
|
|
|
|
|
|
|
|
| |
Further simplify the i_gen calculation for older disks.
Having a zero here is not really a problem and this is more
similar to what is done in newfs_random().
Reported by: Xin Li
MFC after: 4 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In UFS, i_gen is a random generated value and there is not way for
it to be negative. Actually, the value of i_gen is just used to
match bit patterns and it is of not consequence if the values are
signed or not.
Following other filesystems, set it to unsigned and use it as such,
Discussed by: mckusick
Reviewed by: mckusick (previous version)
MFC after: 4 weeks
|
|
|
|
|
|
|
|
|
|
| |
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
No consumers need to find them there and it complicates the tree.
These flags are all FFS specific and could be moved out of the buf
cache.
- Use pbgetvp() and pbrelvp() to associate the background and journal
bufs with the vp. Not only is this much cheaper it makes more sense
for these transient bufs.
- Fix the assertions in pbget* and pbrel*. It's not safe to check list
pointers which were never initialized. Use the BX flags instead. We
also check B_PAGING in reassignbuf() so this should cover all cases.
Discussed with: kib, mckusick, attilio
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
| |
Sponsored by: The FreeBSD Foundation
Tested by: pho, scottl, jhb, bf
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current dqflush() panics when a dquot with with non-zero refcount is
encountered. The situation is possible, because quotas are turned off
before softdep workitem queue if flushed, due to the quota file writes
might create softdep workitems.
Make the encountering an active dquot in dqflush() not fatal, return
the error from quotaoff() instead. Ignore the quotaoff() failures
when ffs_flushfiles() is called in the course of softdep_flushfiles()
loop, until the last iteration. At the last loop, the quotas must be
closed, and because SU workitems should be already flushed, the
references to dquot are gone.
Sponsored by: The FreeBSD Foundation
Reported and tested by: pho
Reviewed by: mckusick
MFC after: 2 weeks
|
|
|
|
|
|
| |
vfs_write_resume_flags().
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
| |
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.
Removes comments that makes no sense now.
Reviewed by: kib
MFC after: 3 days
|
|
|
|
|
|
|
| |
to modify on-disk metadata for filesystems mounted for write.
Reviewed by: kib, mckusick
Sponsored by: FreeBSD Foundation
|
|
|
|
|
| |
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.
|
| |
|
|
|
|
|
| |
Tested by: pho
MFC after: 2 months
|
|
|
|
|
| |
Reviewed by: mckusick
Tested by: pho
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over just the active vnodes associated with a mount point to replace
MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync
routines.
The vfs_msync routine is run every 30 seconds for every writably
mounted filesystem. It ensures that any files mmap'ed from the
filesystem with modified pages have those pages queued to be
written back to the file from which they are mapped.
The ffs_lazy_sync and qsync routines are run every 30 seconds for
every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine
ensures that any files that have been accessed in the previous
30 seconds have had their access times queued for updating in the
filesystem. The qsync routine ensures that any files with modified
quotas have those quotas queued to be written back to their
associated quota file.
In a system configured with 250,000 vnodes, less than 1000 are
typically active at any point in time. Prior to this change all
250,000 vnodes would be locked and inspected twice every minute
by the syncer. For UFS/FFS filesystems they would be locked and
inspected six times every minute (twice by each of these three
routines since each of these routines does its own pass over the
vnodes associated with a mount point). With this change the syncer
now locks and inspects only the tiny set of vnodes that are active.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).
To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.
The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
gets resized and then reloaded.
Reviewed by: kib, mckusick (earlier version)
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
| |
Clearly it must have been set when the mount was done.
Reviewed by: kib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.
Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.
This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.
Reviewed by: kib
|
|
|
|
|
|
|
| |
message for r233609:
Restore the writes of atimes, quotas and superblock from syncer vnode.
Noted by: rdivacky
|
|
|
|
|
| |
Tested by: pho
MFC after: 2 weeks
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
|
| |
with MNT_WAIT flags that passed in its second argument. This will be
MFC'ed together with r232351.
Discussed with: kib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
every 30 seconds. This spike in I/O caused the system to pause every
30 seconds which was quite annoying. So, the way that sync worked
was changed so that when a vnode was first dirtied, it was put on
a 30-second cleaning queue (see the syncer_workitem_pending queues
in kern/vfs_subr.c). If the file has not been written or deleted
after 30 seconds, the syncer pushes it out. As the syncer runs once
per second, dirty files are trickled out slowly over the 30-second
period instead of all at once by a call to sync(2).
The one drawback to this is that it does not cover the filesystem
metadata. To handle the metadata, vfs_allocate_syncvnode() is called
to create a "filesystem syncer vnode" at mount time which cycles
around the cleaning queue being sync'ed every 30 seconds. In the
original design, the only things it would sync for UFS were the
filesystem metadata: inode blocks, cylinder group bitmaps, and the
superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from
which the filesystem is mounted).
Somewhere in its path to integration with FreeBSD the flushing of
the filesystem syncer vnode got changed to sync every vnode associated
with the filesystem. The result of this change is to return to the
old filesystem-wide flush every 30-seconds behavior and makes the
whole 30-second delay per vnode useless.
This change goes back to the originally intended trickle out sync
behavior. Key to ensuring that all the intended semantics are
preserved (e.g., that all inode updates get flushed within a bounded
period of time) is that all inode modifications get pushed to their
corresponding inode blocks so that the metadata flush by the
filesystem syncer vnode gets them to the disk in a timely way.
Thanks to Konstantin Belousov (kib@) for doing the audit and commit
-r231122 which ensures that all of these updates are being made.
Reviewed by: kib
Tested by: scottl
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
| |
vfs_mount_error error message facility provided by the nmount
interface.
Clean up formatting of mount warnings which still need to use
kernel printf's since they do not return errors.
Requested by: Craig Rodrigues <rodrigc@crodrigues.org>
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
is set so that mount can revert back to using MNT_NOWAIT when doing
getmntinfo.
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
| |
so that it is visible to userland programs. This change enables
the `mount' command with no arguments to be able to show if a
filesystem is mounted using journaled soft updates as opposed
to just normal soft updates.
Approved by: re (bz)
|
|
|
|
|
|
|
|
|
| |
(typically fsck_ffs) to register that it wishes to use FFS specific
sysctl's to update the filesystem. This ensures that two checkers
cannot run on a given filesystem at the same time and that no other
process accidentally or maliciously uses the filesystem updating
sysctls inappropriately. This functionality is needed by the
journaling soft-updates recovery code.
|
|
|
|
|
|
|
|
|
| |
filesystems to be opened for writing. This functionality used to
be special-cased for just the root filesystem, but with this change
is now available for all UFS filesystems. This change is needed for
journaled soft updates recovery.
Discussed with: Jeff Roberson
|
|
|
|
|
| |
downgraded to read-only. It will be restarted if the filesystem is
upgraded back to read-write.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.
- Append partial truncation freework structures to indirdeps while
truncation is proceeding. These prevent new block pointers from
becoming valid until truncation completes and serialize truncations.
- On completion of a partial truncate journal work waits for zeroed
pointers to hit indirects.
- softdep_journal_freeblocks() handles last frag allocation and last
block zeroing.
- vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
is only implemented in one place.
- Block allocation failure handling moved up one level so it does not
proceed with buf locks held. This permits us to do more extensive
reclaims when filesystem space is exhausted.
- softdep_sync_metadata() is broken into two parts, the first executes
once at the start of ffs_syncvnode() and flushes truncations and
inode dependencies. The second is called on each locked buf. This
eliminates excessive looping and rollbacks.
- Improve the mechanism in process_worklist_item() that handles
acquiring vnode locks for handle_workitem_remove() so that it works
more generally and does not loop excessively over the same worklist
items on each call.
- Don't corrupt directories by zeroing the tail in fsck. This is only
done for regular files.
- Push a fsync complete record for files that need it so the checker
knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts)
Tested by: pho
|
|
|
|
|
|
|
|
|
|
|
| |
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
|
|
|
|
|
|
|
|
| |
Instead of directly calling ffs_snapgone(), use UFS_SNAPGONE() with
usual layering.
Requested by: bde
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The FS_TRIM fs flag indicates that administrator requested issuing of
TRIM commands for the volume. UFS will only send the command to disk
if the disk reports GEOM::candelete attribute.
Since disk queue is reordered, data block is marked as free in the bitmap
only after TRIM command completed. Due to need to sleep waiting for
i/o to finish, TRIM bio_done routine schedules taskqueue to set the
bitmap bit.
Based on the patch by: mckusick
Reviewed by: mckusick, pjd
Tested by: pho
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
| |
As result, failed softdep_mount() might leave up to two vnodes on the
mp mountlist, preventing mnt_ref from going to zero.
Call ffs_flushfiles() after failed softdep_mount() to clean mountlist.
Initial report by: Garrett Cooper
Reproduced and tested by: pho
|
|
|
|
|
|
|
|
|
|
| |
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.
Requested and reviewed by: bde
MFC after: 2 weeks
|