summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ffs/ffs_softdep.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r292541:kib2015-12-281-32/+32
| | | | Recheck curthread->td_su after the VFS_SYNC() call.
* MFC r287361:kib2015-09-161-19/+29
| | | | | | | | | | | | | | Handle excess of D_NEWBLK in the same way as excess of D_INODEDEP and D_DIRREM, by scheduling ast to flush dependencies. For 32bit arches, reduce the total amount of allowed dependencies by two. MFC r287479: Declare the writes around the call to VFS_SYNC() in softdep_ast_cleanup_proc(). MFC r287483: Do not consume extra reference.
* MFC r283832:kib2015-06-141-2/+0
| | | | Remove unused variable.
* MFC r283604:kib2015-06-101-38/+17
| | | | Remove NODELAY flag.
* MFC r283600:kib2015-06-101-14/+104
| | | | | | | | Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread.
* MFC r283735:kib2015-06-051-6/+0
| | | | Remove several write-only variables.
* MFC r280760:kib2015-04-101-19/+57
| | | | | | | Fix the hand after the immediate reboot after the init binary is unlinked. MFC r280763: Fix build (with gcc).
* MFC r277922:kib2015-02-131-12/+19
| | | | | | | | When mounting SU-enabled mount point, wait until the softdep_flush() thread started and incremented the stat_flush_threads. MFC r278257: Partially revert r277922.
* MFC r273967:kib2014-11-091-5/+6
| | | | | Only trigger a panic when forced operation is done. Convert direct panic() call into KASSERT().
* MFC of 269533 (by mckusick):mckusick2014-08-181-106/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for multi-threading of soft updates. Replace a single soft updates thread with a thread per FFS-filesystem mount point. The threads are associated with the bufdaemon process. Reviewed by: kib Tested by: Peter Holm and Scott Long MFC after: 2 weeks Sponsored by: Netflix MFC of 269853 (by kib): Revision r269457 removed the Giant around mount and unmount code, but r269533, which was tested before r269457 was committed, implicitely relied on the Giant to protect the manipulations of the softdepmounts list. Use softdep global lock consistently to guarantee the list structure now. Insert the new struct mount_softdeps into the softdepmounts only after it is sufficiently initialized, to prevent softdep_speedup() from accessing bare memory. Similarly, remove struct mount_softdeps for the unmounted filesystem from the tailq before destroying structure rwlock. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation
* MFC of 269674:mckusick2014-08-141-0/+31
| | | | | | | | | | | | | | | | The journal is only prepared to handle full-size block numbers, so we have to adjust freeblk records to reflect the change to a full-size block. For example, suppose we have a block made up of fragments 8-15 and want to free its last two fragments. We are given a request that says: FREEBLK ino=5, blkno=14, lbn=0, frags=2, oldfrags=0 where frags are the number of frags to free and oldfrags are the number of fragments to keep. To block align it, we have to change it to have a valid full-size blkno, so it becomes: FREEBLK ino=5, blkno=8, lbn=0, frags=2, oldfrags=6 Submitted by: Mikihito Takehara Tested by: Mikihito Takehara Reviewed by: Jeff Roberson
* Merge r265463:scottl2014-07-011-0/+26
| | | | | | | | | | | | | Due to reasons unknown at this time, the system can be forced to write a journal block even when there are no journal entries to be written. Until the root cause is found, handle this case by ensuring that a valid journal segment is always written. Second, the data buffer used for writing journal entries was never being scrubbed of old data. Fix this. Submitted by: Takehara Mikihito Obtained from: Netflix, Inc.
* MFC r262678;pfg2014-03-051-13/+13
| | | | | | | | | | ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. Reviewed by: mckusick
* MFC of 256801, 256803, 256808, 256812, 256817, 256845, and 256860.mckusick2013-12-301-674/+877
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This set of changes puts in place the infrastructure to allow soft updates to be multi-threaded. It introduces no functional changes from its current operation. MFC of 256860: Allow kernels without options SOFTUPDATES to build. This should fix the embedded tinderboxes. Reviewed by: emaste MFC of 256845: Fix build problem on ARM (which defaults to building without soft updates). Reported by: Tinderbox Sponsored by: Netflix MFC of 256817: Restructuring of the soft updates code to set it up so that the single kernel-wide soft update lock can be replaced with a per-filesystem soft-updates lock. This per-filesystem lock will allow each filesystem to have its own soft-updates flushing thread rather than being limited to a single soft-updates flushing thread for the entire kernel. Move soft update variables out of the ufsmount structure and into their own mount_softdeps structure referenced by ufsmount field um_softdep. Eventually the per-filesystem lock will be in this structure. For now there is simply a pointer to the kernel-wide soft updates lock. Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock pointer in the mount_softdeps structure instead of a pointer to the kernel-wide soft-updates lock. Replace the five hash tables used by soft updates with per-filesystem copies of these tables allocated in the mount_softdeps structure. Several functions that flush dependencies when too many are allocated in the kernel used to operate across all filesystems. They are now parameterized to flush dependencies from a specified filesystem. For now, we stick with the round-robin flushing strategy when the kernel as a whole has too many dependencies allocated. While there are many lines of changes, there should be no functional change in the operation of soft updates. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256812: Fourth of several cleanups to soft dependency implementation. Add KASSERTS that soft dependency functions only get called for filesystems running with soft dependencies. Calling these functions when soft updates are not compiled into the system become panic's. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256808: Third of several cleanups to soft dependency implementation. Ensure that softdep_unmount() and softdep_setup_sbupdate() only get called for filesystems running with soft dependencies. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256803: Second of several cleanups to soft dependency implementation. Delete two unused functions in ffs_sofdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256801: First of several cleanups to soft dependency implementation. Convert three functions exported from ffs_softdep.c to static functions as they are not used outside of ffs_softdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix
* MFC of 258789:mckusick2013-12-291-2/+20
| | | | | | | | | | | | | | | We needlessly panic when trying to flush MKDIR_PARENT dependencies. We had previously tried to flush all MKDIR_PARENT dependencies (and all the NEWBLOCK pagedeps) by calling ffs_update(). However this will only resolve these dependencies in direct blocks. So very large directories with MKDIR_PARENT dependencies in indirect blocks had not yet gotten flushed. As the directory is in the midst of doing a complete sync, we simply defer the checking of the MKDIR_PARENT dependencies until the indirect blocks have been sync'ed. Reported by: Shawn Wallbridge of imaginaryforces.com Tested by: John-Mark Gurney <jmg@funkthat.com> PR: 183424
* With the addition of journalled soft updates, the "newblk" structuresmckusick2013-08-051-1/+1
| | | | | | | | | | | | persist much longer than previously. Historically we had at most 100 entries; now the count may reach a million. With the increased count we spent far too much time looking them up in the grossly undersized newblk hash table. Configure the newblk hash table to accurately reflect the number of entries that it must index. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* To better understand performance problems with journalled soft updates,mckusick2013-08-051-9/+43
| | | | | | | | | | we need to collect the highest level of allocation for each of the different soft update dependency structures. This change collects these statistics and makes them available using `sysctl debug.softdep.highuse'. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* - Convert the bufobj lock to rwlock.jeff2013-05-311-61/+60
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* Properly spell sentinel (missed in 250891)mckusick2013-05-221-1/+1
| | | | | | | No functional changes. Spotted by: Navdeep Parhar and Alexey Dokuchaev MFC after: 2 weeks
* Add missing buffer releases (brelse) after bread calls that returnmckusick2013-05-221-2/+6
| | | | | | | | | an error. One could argue that returning a buffer even when it is not valid is incorrect, but bread has always returned a buffer valid or not. Reviewed by: kib MFC after: 2 weeks
* Add missing 28th element to softdep types name array.mckusick2013-05-221-1/+4
| | | | | | Found by: Coverity Scan, CID 1007621 Reviewed by: kib MFC after: 2 weeks
* Null a pointer after it is freed so that when it is returnedmckusick2013-05-221-0/+1
| | | | | | | | | | | the return value is NULL. Based on the returned flags, the return value should never be inspected in the case where NULL is returned, but it is good coding practice not to return a pointer to freed memory. Found by: Coverity Scan, CID 1006096 Reviewed by: kib MFC after: 2 weeks
* Remove a bogus check for a NULL buffer pointer.mckusick2013-05-221-7/+8
| | | | | | | | Add a KASSERT that it is not NULL. Found by: Coverity Scan, CID 1009114 Reviewed by: kib MFC after: 2 weeks
* Properly spell sentinel (not sintenel or sentinal).mckusick2013-05-221-28/+28
| | | | | | | No functional changes. Spotted by: kib MFC after: 2 weeks
* Prepare to replace the buf splay with a trie:jeff2013-04-061-7/+10
| | | | | | | | | | | | | | | | - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
* The code in clear_remove() and clear_inodedeps() skips one entrymckusick2013-04-031-4/+4
| | | | | | | | | | | | | | | | in the pagedep and inodedep hash tables. An entry in the table is skipped because 'pagedep_hash' and 'inodedep_hash' hold the size of the hash tables - 1. The chance that this would have any operational failure is extremely unlikely. These funtions only need to find a single entry and are only called when there are too many entries. The chance that they would fail because all the entries are on the single skipped hash chain are remote. Submitted by: Pedro Martelletto Reviewed by: kib MFC after: 2 weeks
* The softdep freeblks workitem might hold a reference on the dquot.kib2013-02-271-3/+20
| | | | | | | | | | | | | | | | | | | Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
* Add flags argument to vfs_write_resume() and removekib2013-01-111-1/+1
| | | | | | vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
* Fixup r218424: uio_yield() was scaling directly to userland priority.attilio2012-12-211-1/+1
| | | | | | | | | | | | | | | When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week
* - Fix a truncation bug with softdep journaling that could leak blocks onjeff2012-11-141-39/+100
| | | | | | | | | | | | | | crash. When truncating a file that never made it to disk we use the canceled allocation dependencies to hold the journal records until the truncation completes. Previously allocdirect dependencies on the id_bufwait list were not considered and their journal space could expire before the bitmaps were written. Cancel them and attach them to the freeblks as we do for other allocdirects. - Add KTR traces that were used to debug this problem. - When adding jsegdeps, always use jwork_insert() so we don't have more than one segdep on a given jwork list. Sponsored by: EMC / Isilon Storage Division
* - Fix a bug that has existed since the original softdep implementation.jeff2012-11-121-14/+27
| | | | | | | | | | | | | When a background copy of a cg is written we complete any work associated with that bmsafemap. If new work has been added to the non-background copy of the buffer it will be completed before the next write happens. The solution is to do the rollbacks when we make the copy so only those dependencies that were present at the time of writing will be completed when the background write completes. This would've resulted in various bitmap related corruptions and panics. It also would've expired journal entries early causing journal replay to miss some records. MFC after: 2 weeks
* - Correct rev 242734, segments can sometimes get stuck. Be a bit morejeff2012-11-091-1/+4
| | | | | | defensive with segment state. Reported by: b. f. <bf1783@googlemail.com>
* - Implement BIO_FLUSH support around journal entries. This will not 100%jeff2012-11-081-16/+121
| | | | | | | | | | | solve power loss problems with dishonest write caches. However, it should improve the situation and force a full fsck when it is unable to resolve with the journal. - Resolve a case where the journal could wrap in an unsafe way causing us to prematurely lose journal entries in very specific scenarios. Discussed with: mckusick MFC after: 1 month
* - In cancel_mkdir_dotdot don't panic if the inodedep is not available. Ifjeff2012-11-021-1/+1
| | | | | | | | | the previous diradd had already finished it could have been reclaimed already. This would only happen under heavy dependency pressure. Reported by: Andrey Zonov <zont@FreeBSD.org> Discussed with: mckusick MFC after: 1 week
* Fix two problems that caused instant panic when the device mountedtrasz2012-10-281-2/+7
| | | | | | | | with softupdates went away. Note that this does not fix the problem entirely; I'm committing it now to make it easier for someone to pick up the work. Reviewed by: mckusick
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-5/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix up kernel sources to be ready for a 64-bit ino_t.mdf2012-09-271-12/+15
| | | | Original code by: Gleb Kurtsou
* In softdep_setup_inomapdep() we may have to allocate both inodedepmckusick2012-06-111-14/+41
| | | | | | | | | | | | | | | | | | | | | | | and bmsafemap dependency structures in inodedep_lookup() and bmsafemap_lookup() respectively. The setup of these structures must be done while holding the soft-dependency mutex. If the inodedep is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the bmsafemap. If the bmsafemap is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the inodedep. To resolve this problem, bmsafemap_lookup has had a parameter added that allows a pre-malloc'ed bmsafemap to be passed in so that it does not need to release the mutex to create a new bmsafemap. The softdep_setup_inomapdep() routine pre-malloc's a bmsafemap dependency before acquiring the mutex and starting to build the inodedep with a call to inodedep_lookup(). The subsequent call to bmsafemap_lookup() is passed this pre-allocated bmsafemap entry so that it need not release the mutex if it needs to create a new one. Reported by: Peter Holm Tested by: Peter Holm MFC after: 1 week
* Add missing `continue' statement at end of case.mckusick2012-05-181-0/+1
| | | | | Found by: Kevin Lo (kevlo@) MFC after: 1 week
* Remove unused thread argument from clear_inodeps() and clear_remove().trasz2012-04-231-11/+8
|
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-171-10/+2
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* A file cannot be deallocated until its last name has been removedmckusick2012-04-021-52/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and it is no longer referenced by a user process. The inode for a file whose name has been removed, but is still referenced at the time of a crash will still be allocated in the filesystem, but will have no references (e.g., they will have no names referencing them from any directory). With traditional soft updates these unreferenced inodes will be found and reclaimed when the background fsck is run. When using journaled soft updates, the kernel must keep track of these inodes so that it can find and reclaim them during the cleanup process. Their existence cannot be stored in the journal as the journal only handles short-term events, and they may persist for days. So, they are tracked by keeping them in a linked list whose head pointer is stored in the superblock. The journal tracks them only until their linked list pointers have been commited to disk. Part of the cleanup process involves traversing the list of unreferenced inodes and reclaiming them. This bug was triggered when confusion arose in the commit steps of keeping the unreferenced-inode linked list coherent on disk. Notably, a race between the link() system call adding a link-count to a file and the unlink() system call removing a link-count to the file. Here if the unlink() ran after link() had looked up the file but before link() had incremented the link-count of the file, the file's link-count would drop to zero before the link() incremented it back up to one. If the file was referenced by a user process, the first transition through zero made it appear that it should be added to the unreferenced-inode list when in fact it should not have been added. If the new name created by link() was deleted within a few seconds (with the file still referenced by a user process) it would legitimately be a candidate for addition to the unreferenced-inode list. The result was that there were two attempts to add the same inode to the unreferenced-inode list which scrambled the unreferenced-inode list's pointers leading to a panic. The fix is to detect and avoid the false attempt at adding it to the unreferenced-inode list by having the link() system call check to see if the link count is zero before it increments it. If it is, the link() fails with ENOENT (showing that it has failed the link()/unlink() race). While tracking down this bug, we have added additional assertions to detect the problem sooner and also simplified some of the code. Reported by: Kirk Russell Fix submitted by: Jeff Roberson Tested by: Peter Holm PR: kern/159971 MFC (to 9 only): 2 weeks
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-251-9/+9
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Supply boolean as the second argument to ffs_update(), and not akib2012-03-131-5/+5
| | | | | | | | MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks
* Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag whichkib2012-03-091-2/+1
| | | | | | allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-011-34/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Missing conditions in checking whether an inode has been written.mckusick2012-02-131-0/+3
| | | | | Found and tested by: Peter Holm MFC after: 2 weeks (to 9 only)
* Add missing opt_quota.h include to activate #ifdef QUOTA blocks,kib2012-02-061-1/+2
| | | | | | | apparently a step in unbreaking QUOTA support. Reported and tested by: Adam Strohl <adams-freebsd ateamsystems com> MFC after: 1 week
* JNEWBLK dependency may legitimately appear on the buf dependencykib2012-02-061-0/+1
| | | | | | | | | list. If softdep_sync_buf() discovers such dependency, it should do nothing, which is safe as it is only waiting on the parent buffer to be written, so it can be removed. Committed on behalf of: jeff MFC after: 1 week
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-4/+5
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
OpenPOWER on IntegriCloud