summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ffs/ffs_vfsops.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC of 291244, 291380, 291459, 291460, 291671, and 291743:mckusick2015-12-301-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This MFC includes changes to better manage the vnode freelist and to streamline the allocation and freeing of vnodes. Note that to maintain the KPI the VI_AGE flag is left defined in sys/vnode.h though its use is dropped as described in 291380. To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable remains though it no longer has any effect as described in 291244. MFC of 291244: Move the comment about resident pages preventing vnode from leaving active list, into the header comment for vdrop(), which is the function that decides whether to leave the vnode on the list. Note that dirty page write-out in vinactive() is asynchronous. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC of 291380: Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation MFC of 291459: For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm MFC of 291460: As the kernel allocates and frees vnodes, it fully initializes them on every allocation and fully releases them on every free. These are not trivial costs: it starts by zeroing a large structure then initializes a mutex, a lock manager lock, an rw lock, four lists, and six pointers. And looking at vfs.vnodes_created, these operations are being done millions of times an hour on a busy machine. As a performance optimization, this code update uses the uma_init and uma_fini routines to do these initializations and cleanups only as the vnodes enter and leave the vnode_zone. With this change the initializations are only done kern.maxvnodes times at system startup and then only rarely again. The frees are done only if the vnode_zone shrinks which never happens in practice. For those curious about the avoided work, look at the vnode_init() and vnode_fini() functions in kern/vfs_subr.c to see the code that has been removed from the main vnode allocation/free path. Reviewed by: kib Tested by: Peter Holm MFC of 291671: We need to zero out the union of pointers in a freed vnode structure. Fix from: Mateusz Guzik Tested by: Jason Unovitch MFC of 291743: We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik
* MFC r284887:kib2015-07-111-1/+18
| | | | | | | | | Handle errors from background write of the cylinder group blocks. MFC r284927: Simplify code. Approved by: re (gjb)
* MFC r284495:kib2015-07-011-3/+11
| | | | | Keep a vnode which is freed but still owing inactivation, on the active list. This closes a race where such vnode is not msync-ed until reboot.
* MFC r283735:kib2015-06-051-8/+2
| | | | Remove several write-only variables.
* MFC of 269533:mckusick2015-05-281-0/+1
| | | | | | | | | | | | | | | | | | | Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@)
* MFC: r281562rmacklem2015-04-301-1/+1
| | | | | | | | | | | File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache.
* MFC r280760:kib2015-04-101-5/+11
| | | | | | | Fix the hand after the immediate reboot after the init binary is unlinked. MFC r280763: Fix build (with gcc).
* MFC r270203:kib2014-08-271-1/+1
| | | | Correct the test for condition to suspend UFS filesystem during unmount.
* MFC r268612:kib2014-07-281-44/+7
| | | | | | | Add helper helper vfs_write_suspend_umnt(). Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized.
* MFC r262678;pfg2014-03-051-1/+1
| | | | | | | | | | ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. Reviewed by: mckusick
* MFC of 256801, 256803, 256808, 256812, 256817, 256845, and 256860.mckusick2013-12-301-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This set of changes puts in place the infrastructure to allow soft updates to be multi-threaded. It introduces no functional changes from its current operation. MFC of 256860: Allow kernels without options SOFTUPDATES to build. This should fix the embedded tinderboxes. Reviewed by: emaste MFC of 256845: Fix build problem on ARM (which defaults to building without soft updates). Reported by: Tinderbox Sponsored by: Netflix MFC of 256817: Restructuring of the soft updates code to set it up so that the single kernel-wide soft update lock can be replaced with a per-filesystem soft-updates lock. This per-filesystem lock will allow each filesystem to have its own soft-updates flushing thread rather than being limited to a single soft-updates flushing thread for the entire kernel. Move soft update variables out of the ufsmount structure and into their own mount_softdeps structure referenced by ufsmount field um_softdep. Eventually the per-filesystem lock will be in this structure. For now there is simply a pointer to the kernel-wide soft updates lock. Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock pointer in the mount_softdeps structure instead of a pointer to the kernel-wide soft-updates lock. Replace the five hash tables used by soft updates with per-filesystem copies of these tables allocated in the mount_softdeps structure. Several functions that flush dependencies when too many are allocated in the kernel used to operate across all filesystems. They are now parameterized to flush dependencies from a specified filesystem. For now, we stick with the round-robin flushing strategy when the kernel as a whole has too many dependencies allocated. While there are many lines of changes, there should be no functional change in the operation of soft updates. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256812: Fourth of several cleanups to soft dependency implementation. Add KASSERTS that soft dependency functions only get called for filesystems running with soft dependencies. Calling these functions when soft updates are not compiled into the system become panic's. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256808: Third of several cleanups to soft dependency implementation. Ensure that softdep_unmount() and softdep_setup_sbupdate() only get called for filesystems running with soft dependencies. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256803: Second of several cleanups to soft dependency implementation. Delete two unused functions in ffs_sofdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256801: First of several cleanups to soft dependency implementation. Convert three functions exported from ffs_softdep.c to static functions as they are not used outside of ffs_softdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix
* There are several code sequences likekib2013-07-091-2/+2
| | | | | | | | | | | | | | | | | | | vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Style fix: spaces.pfg2013-07-021-1/+1
| | | | | | | Cleanup the incomplete revert. Reported by: bde MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | | | | Revert the simplification of the i_gen calculation. It is still a good idea to avoid zero values and for the case of old filesystems there is probably no advantage in using the complete 32 bits anyways. Discussed with: bde MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | | | Further simplify the i_gen calculation for older disks. Having a zero here is not really a problem and this is more similar to what is done in newfs_random(). Reported by: Xin Li MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | | | | | | | In UFS, i_gen is a random generated value and there is not way for it to be negative. Actually, the value of i_gen is just used to match bit patterns and it is of not consequence if the values are signed or not. Following other filesystems, set it to unsigned and use it as such, Discussed by: mckusick Reviewed by: mckusick (previous version) MFC after: 4 weeks
* - Convert the bufobj lock to rwlock.jeff2013-05-311-1/+3
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* Prepare to replace the buf splay with a trie:jeff2013-04-061-20/+8
| | | | | | | | | | | | | | | | - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
* UFS support of the unmapped i/o for the user data buffers.kib2013-03-191-1/+2
| | | | | Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf
* The softdep freeblks workitem might hold a reference on the dquot.kib2013-02-271-7/+26
| | | | | | | | | | | | | | | | | | | Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
* Add flags argument to vfs_write_resume() and removekib2013-01-111-11/+7
| | | | | | vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
* r16312 is not any longer real since many years (likely since when VFSattilio2012-11-191-8/+0
| | | | | | | | | | received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
* Add UFS writesuspension mechanism, designed to allow userland processestrasz2012-11-181-10/+33
| | | | | | | to modify on-disk metadata for filesystems mounted for write. Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
* Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.attilio2012-11-091-2/+2
| | | | | Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
* Use NULL instead of 0 for pointerskevlo2012-07-221-1/+1
|
* Enable vn_io_fault() lock avoidance for UFS.kib2012-05-301-1/+1
| | | | | Tested by: pho MFC after: 2 months
* Fix use-after-free introduced in r234036.trasz2012-04-211-1/+5
| | | | | Reviewed by: mckusick Tested by: pho
* This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loopsmckusick2012-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-171-42/+18
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Fix panic in ffs_reload(), which may happen when read-only filesystemtrasz2012-04-081-2/+8
| | | | | | | gets resized and then reloaded. Reviewed by: kib, mckusick (earlier version) Sponsored by: The FreeBSD Foundation
* Drop an unnecessary setting of si_mountpt when updating a UFS mount point.mckusick2012-04-081-2/+0
| | | | | | Clearly it must have been set when the mount was done. Reviewed by: kib
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+6
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Do trivial reformatting of the comment to record the missed commitkib2012-03-281-4/+3
| | | | | | | message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky
* Reviewed by: bde, mckusickkib2012-03-281-11/+73
| | | | | Tested by: pho MFC after: 2 weeks
* Update comment.kib2012-03-281-1/+1
| | | | MFC after: 3 days
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-251-1/+1
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* In the original days of BSD, a sync was issued on every filesystemmckusick2012-02-071-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks
* Make sure all intermediate variables holding mount flags (mnt_flag)mckusick2012-01-171-2/+2
| | | | | | | and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
* Convert FFS mount error messages from kernel printf's to using themckusick2012-01-141-61/+65
| | | | | | | | | | | vfs_mount_error error message facility provided by the nmount interface. Clean up formatting of mount warnings which still need to use kernel printf's since they do not return errors. Requested by: Craig Rodrigues <rodrigc@crodrigues.org> MFC after: 2 weeks
* Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEPmckusick2011-07-301-9/+6
| | | | | | | is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
* Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flagmckusick2011-07-241-0/+3
| | | | | | | | | so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)
* Add an FFS specific mount option to allow a filesystem checkermckusick2011-07-151-14/+124
| | | | | | | | | (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
* Allow disk partitions associated with UFS read-only mountedmckusick2011-07-101-15/+7
| | | | | | | | | filesystems to be opened for writing. This functionality used to be special-cased for just the root filesystem, but with this change is now available for all UFS filesystems. This change is needed for journaled soft updates recovery. Discussed with: Jeff Roberson
* Disable the soft updates journaling after a filesystem is successfullymckusick2011-06-121-0/+2
| | | | | downgraded to read-only. It will be restarted if the filesystem is upgraded back to read-write.
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-101-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* Add a lock flags argument to the VFS_FHTOVP() file systemrmacklem2011-05-221-2/+3
| | | | | | | | | | | method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
* Retire opt_ffs_broken_fixme.h.kib2011-03-201-0/+1
| | | | | | | | Instead of directly calling ffs_snapgone(), use UFS_SNAPGONE() with usual layering. Requested by: bde MFC after: 1 week
* Add kernel side support for BIO_DELETE/TRIM on UFS.kib2010-12-291-0/+15
| | | | | | | | | | | | | | | | The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute. Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit. Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month
* Journal start looks up .sujournal file by doing lookup on the root dvp.kib2010-12-011-0/+1
| | | | | | | | | | As result, failed softdep_mount() might leave up to two vnodes on the mp mountlist, preventing mnt_ref from going to zero. Call ffs_flushfiles() after failed softdep_mount() to clean mountlist. Initial report by: Garrett Cooper Reproduced and tested by: pho
* The r184588 changed the layout of struct export_args, causing an ABIkib2010-10-101-1/+3
| | | | | | | | | | breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks
OpenPOWER on IntegriCloud