summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Add an FFS specific mount option to allow a filesystem checkermckusick2011-07-155-24/+306
| | | | | | | | | (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
* Consistently check mount flag (MNTK_SUJ) rather than superblockmckusick2011-07-141-2/+2
| | | | | | | | | | flag (FS_SUJ) when determining whether to do journaling-based operations. The mount flag is set only when journaling is active while the superblock flag is set to indicate that journaling is to be used. For example, when the filesystem is mounted read-only, the journaling may be present (FS_SUJ) but not active (MNTK_SUJ). Inappropriate checking of the FS_SUJ flag was causing some journaling actions to be attempted at inappropriate times.
* When first creating snapshots, we may free some blocks within it.mckusick2011-07-101-1/+5
| | | | | | These blocks should not have TRIM applied to them. Submitted by: Kostik Belousov
* Allow disk partitions associated with UFS read-only mountedmckusick2011-07-101-15/+7
| | | | | | | | | filesystems to be opened for writing. This functionality used to be special-cased for just the root filesystem, but with this change is now available for all UFS filesystems. This change is needed for journaled soft updates recovery. Discussed with: Jeff Roberson
* Use 'curthread_pflags' instead of 'thread_pflags' to signify that onlykib2011-07-091-12/+12
| | | | | | | curthread can be operated upon. Requested by: attilio MFC after: 1 week
* Use helper functions instead of manually managing TDP_INBDFLUSH.kib2011-07-091-16/+12
| | | | | | Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version) MFC after: 1 week
* - Speed up pendingblock processing again. Having too much delay betweenjeff2011-07-042-15/+41
| | | | | ffs_blkfree() and the pending adjustment causes all kinds of space related problems.
* - Handle D_JSEGDEP in the softdep_sync_buf() switch. These can nowjeff2011-07-041-0/+1
| | | | | | find themselves on snapshot vnodes. Reported by: pho
* - It is impossible to run request_cleanup() while doing a copyonwrite.jeff2011-07-041-25/+21
| | | | | | | | | | | | This will most likely cause new block allocations which can recurse into request cleanup. - While here optimize the ufs locking slightly. We need only acquire and drop once. - process_removes() and process_truncates() also is only needed once. - Attempt to flush each item on the worklist once but do not loop forever if some can not be completed. Discussed with: mckusick
* - Fix an inode quota leak. We need to decrement the quota once and onlyjeff2011-07-041-5/+4
| | | | | | | once. Tested by: pho Reviewed by: mckusick
* Handle the FREEDEP case in softdep_sync_buf().mckusick2011-06-291-0/+1
| | | | | | This fix failed to get added in -r223325. Submitted by: Peter Holm
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-291-1/+1
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* - Fix directory count rollbacks by passing the mode to the journal depjeff2011-06-203-47/+163
| | | | | | earlier. - Add rollback/forward code for frag and cluster accounting. - Handle the FREEDEP case in softdep_sync_buf(). (submitted by pho)
* Fixed dereference of a NULL pointer.mckusick2011-06-181-1/+2
| | | | Reported by: Peter Holm
* Drop the include of <ufs/ffs/ffs_extern.h> from usr.sbin/makefs/ffs/ffs_bswap.cmckusick2011-06-161-2/+5
| | | | | | | | | and usr.sbin/makefs/ffs/ffs_subr.c as they have no need of anything in that file. No other programs or libraries include <ufs/ffs/ffs_extern.h> (nor should they as it is totally in-kernel interfaces). For added protection I enclosed the entire contents of <ufs/ffs/ffs_extern.h> in ifdef _KERNEL. Feedback from: Bruce Evans and Tai-hwa Liang
* Fixing compilation bustage by introducing another forward declaration.avatar2011-06-161-0/+1
|
* Ensure that filesystem metadata contained within persistent snapshotsmckusick2011-06-157-42/+74
| | | | | | is always kept consistent. Suggested by: Jeff Roberson
* With the restructuring of the block reclaimation code, the notificationmckusick2011-06-153-4/+28
| | | | | | | messages for a filesystem being out of space need to be moved so that they do not print out until after a failed cleanup attempt. Suggested by: Jeff Roberson
* Missing cleanup case after completion of a snapshot vnode writemckusick2011-06-151-0/+4
| | | | | | | claiming a released block. Submitted by: Jeff Roberson Tested by: Peter Holm
* Use alternative, less messy solution to avoid breakage after r223020:dim2011-06-131-0/+2
| | | | | | put the snapdata structure between #ifdef _KERNEL guards. Suggested by: kib
* Update to soft updates journaling to properly track freed blocksmckusick2011-06-125-30/+180
| | | | | | | that get claimed by snapshots. Submitted by: Jeff Roberson Tested by: Peter Holm
* Disable the soft updates journaling after a filesystem is successfullymckusick2011-06-122-2/+11
| | | | | downgraded to read-only. It will be restarted if the filesystem is upgraded back to read-write.
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-1011-1507/+2627
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* - Add support for referencing quota structures without needing the inodejeff2011-06-102-0/+101
| | | | | | pointer for softupdates. Submitted by: mckusick
* - If the fsync in ufs_direnter fails SUJ can later panic because we havejeff2011-06-101-1/+1
| | | | | | | partially added a name. Allow ufs_direnter() to continue in the hopes that it is a transient error. If it is not, the directory is corrupted already from IO errors and writing this new block is not likely to make things worse.
* Grammer fix in comment.mckusick2011-06-051-3/+3
| | | | | | | | Eliminate one (of several) possible conflicting buffer locks when trying to reclaim blocks. Rest of fix to be incorporated as part of SUJ update by jeff. Pointed out by: Kostik Belousov
* Due to a lag in updating the fs_pendinginodes count, we cannot dependmckusick2011-05-281-1/+1
| | | | | | | on it to decide whether we should try to reclaim inodes when we run short. Discovered by: Peter Holm
* The check for whether a block is going to be claimed by a snapshotmckusick2011-05-261-4/+12
| | | | | needs to happen before we notify the underlying layer that it is being freed.
* Fix the ufs/ffs file system so that it uses the lockrmacklem2011-05-221-1/+1
| | | | | | flags argument added to VFS_FHTOVP() by r222167. Reviewed by: mckusick
* Add a lock flags argument to the VFS_FHTOVP() file systemrmacklem2011-05-223-4/+6
| | | | | | | | | | | method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+1
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* Fix typos.kib2011-04-301-2/+2
| | | | | | Noted by: Fabian Keil <freebsd-listen fabiankeil de> Pointy hat to: kib MFC after: 1 week
* Clarify the comment.kib2011-04-301-2/+4
| | | | MFC after: 1 week
* VFS sometimes is unable to inactivate a vnode when vnode use countkib2011-04-243-23/+31
| | | | | | | | | | | | | | | | | | | | | | goes to zero. E.g., the vnode might be only shared-locked at the time of vput() call. Such vnodes are kept in the hash, so they can be found later. If ffs_valloc() allocated an inode that has its vnode cached in hash, and still owing the inactivation, then vget() call from ffs_valloc() clears VI_OWEINACT, and then the vnode is reused for the newly allocated inode. The problem is, the vnode is not reclaimed before it is put to the new use. ffs_valloc() recycles vnode vm object, but this is not enough. In particular, at least v_vflag should be cleared, and several bits of UFS state need to be removed. It is very inconvenient to call vgone() at this point. Instead, move some parts of ufs_reclaim() into helper function ufs_prepare_reclaim(), and call the helper from VOP_RECLAIM and ffs_valloc(). Reviewed by: mckusick Tested by: pho MFC after: 3 weeks
* - Refactor softdep_setup_freeblocks() into a set of functions to preparejeff2011-04-111-151/+221
| | | | | | | for a new journal specific partial truncate routine. - Use dep_current[] in place of specific dependency counts. This is automatically maintained when workitems are allocated and has less risk of becoming incorrect.
* Fix a long standing SUJ performance problem:jeff2011-04-102-57/+215
| | | | | | | | | | | | | | | - Keep a hash of indirect blocks that have recently been freed and are still referenced in the journal. - Lookup blocks in this hash before forcing a new block write to wait on the journal entry to hit the disk. This is only necessary to avoid confusion between old identities as indirects and new identities as file blocks. - Don't free jseg structures until the journal has written a record that invalidates it. This keeps the indirect block information around for as long as is required to be safe. - Force an empty journal block write when required to flush out stale journal data that is simply waiting for the oldest valid sequence number to advance beyond it.
* - Don't invalidate jnewblks immediately upon discovering that the blockjeff2011-04-072-102/+233
| | | | | | | | will be removed. Permit the journal to proceed so that we don't leave a rollback in a cg for a very long time as this can cause terrible perf problems in low memory situations. Tested by: pho
* Be far more persistent in reclaiming blocks and inodes before givingmckusick2011-04-053-19/+131
| | | | | | | | | | up and declaring a filesystem out of space. Especially necessary when running on a small filesystem. With this improvement, it should be possible to use soft updates on a small root filesystem. Kudos to: Peter Holm Testing by: Peter Holm MFC: 2 weeks
* Fix problems that manifested from filesystem full conditions:jeff2011-04-021-9/+14
| | | | | | | | | | | | | | | | | | - In softdep_revert_mkdir() find the dotaddref before we attempt to cancel the jaddref so we can make assumptions about where the dotaddref is on the list. cancel_jaddref() does not always remove items from the list anymore. - Always set GOINGAWAY on an inode in softdep_freefile() if DEPCOMPLETE was never set. This ensures that dependencies will continue to be processed on the inowait/bufwait list and is more an artifact of the structure of the code than a pure ordering problem. - Always set DEPCOMPLETE on canceled jaddrefs so that they can be freed appropriately. This normally occurs when the refs are added to the journal but if they are canceled before this point the state would never be set and the dependency could never be freed. Reported by: pho Tested by: pho
* Fix the softdep_request_cleanup() function definition for !SOFTUPDATES case.kib2011-03-281-1/+2
| | | | Submitted by: Aleksandr Rybalko <ray dlink ua>
* Add retry code analogous to the block allocation retry codemckusick2011-03-233-21/+46
| | | | | | to avoid running out of inodes. Reported by: Peter Holm
* Retire opt_ffs_broken_fixme.h.kib2011-03-203-3/+5
| | | | | | | | Instead of directly calling ffs_snapgone(), use UFS_SNAPGONE() with usual layering. Requested by: bde MFC after: 1 week
* Remove the #if defined(FFS) || defined(IFS) braces around the calls tokib2011-03-171-4/+0
| | | | | | | | | | | | ffs_snapgone(). ufs.ko module is not build with FFS define, causing snapshot inode number slots in superblock never be freed, as well as a reference on the snapshot vnode. IFS was removed several years ago, and UFS/FFS separation was not maintained for real. Reported, analyzed and tested by: Yamagi Burmeister <lists yamagi org> MFC after: 3 days
* Simplify uses of the web of pointers.kib2011-03-072-11/+7
| | | | | Reviewed by: mckusick MFC after: 1 week
* The UFS dirhash code was attempting to update shared state in the dirhashjhb2011-03-072-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | from multiple threads while holding a shared lock during a lookup operation. This could result in incorrect ENOENT failures which could then be permanently stored in the name cache. Specifically, the dirhash code optimizes the case that a single thread is walking a directory sequentially opening (or stat'ing) each file. It uses state in the dirhash structure to determine if a given lookup is using the optimization. If the optimization fails, it disables it and restarts the lookup. The problem arises when two threads both attempt the optimization and fail. The first thread will restart the loop, but the second thread will incorrectly think that it did not try the optimization and will only examine a subset of the directory entires in its hash chain. As a result, it may fail to find its directory entry and incorrectly fail with ENOENT. To make this safe for use with shared locks, simplify the state stored in the dirhash and move some of the state (the part that determines if the current thread is trying the optimization) into a local variable. One result is that we will now try the optimization more often. We still update the value under the shared lock, but it is a single atomic store similar to i_diroff that is stored in UFS directory i-nodes for the non-dirhash lookup. Reviewed by: kib MFC after: 1 week
* Use ffs() to locate free bits in the inode bitmap rather than a loop withjhb2011-03-041-10/+6
| | | | | | | bit shifts. Reviewed by: mckusick MFC after: 1 month
* v_mountedhere is a member of the union. Check that the vnodes havekib2011-02-191-1/+3
| | | | | | proper type before using the member. Reported and tested by: Michael Butler <imb protected-networks net>
* Use the native sector size of the device backing the UFS volume for SU+Jkib2011-02-122-12/+14
| | | | | | | | | | | | | | journal blocks, instead of hard coding 512 byte sector size. Journal need to atomically write the block, that can only be guaranteed at the device sector size, not larger. Attempt to write less then sector size results in driver errors. Note that this is the first structure in UFS that depends on the sector size. Other elements are written in the units of fragments. In collaboration with: pho Reviewed by: jeff Tested by: bz, pho
* Wrap long line.netchild2011-02-101-1/+2
| | | | Noticed by: bz
* Add some FEATURE macros for some UFS features.netchild2011-02-094-0/+17
| | | | | | | | | | | | | | | | | | SU+J is not included as a FEATURE macro: - it was not in the tree during the GSoC - I do not see an option to en-/disable it in NOTES Two minor changes where made during the review compared to what was developed during GSoC 2010. No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: kib X-MFC after: to be determined in last commit with code from this project
OpenPOWER on IntegriCloud