summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* This update eliminates a lock-order reversal warning discoveredmckusick2011-09-271-21/+24
| | | | | | | | | | | whle tracking down the system hang reported in kern/160662 and corrected in revision 225806. The LOR is not the cause of the system hang and indeed cannot cause an actual deadlock. However, it can be easily eliminated by defering the acquisition of a buflock until after all the vnode locks have been acquired. Reported by: Hans Ottevanger PR: kern/160662
* This update eliminates the system hang reported in kern/160662 whenmckusick2011-09-271-1/+1
| | | | | | | | taking a snapshot on a filesystem running with journaled soft updates. Reported by: Hans Ottevanger Fix verified by: Hans Ottevanger PR: kern/160662
* Use nowait sync request for a vnode when doing softdep cleanup. We possiblykib2011-09-201-1/+1
| | | | | | | own the unrelated vnode lock, doing waiting sync causes deadlocks. Reported and tested by: pho Approved by: re (bz)
* Generalize ffs_pages_remove() into vn_pages_remove().mm2011-08-253-16/+3
| | | | | | | | | | | Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
* Fix lock leak.ae2011-08-231-2/+2
| | | | | | Reported by: Alex Lyashkov Approved by: re (kib) MFC after: 1 week
* Fix two cases involving opt_capsicum.h and module builds:rwatson2011-08-151-1/+0
| | | | | | | | | | | | | | (1) opt_capsicum.h is no longer required in ffs_alloc.c, so remove the #include. (2) portalfs depends on opt_capsicum.h, so have the Makefile generate one if required. These affect only modules built without a kernel (i.e, not buildkernel, but yes buildworld if the dubious MODULES_WITH_WORLD is used). Approved by: re (bz) Sponsored by: Google Inc
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEPmckusick2011-07-305-35/+34
| | | | | | | is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
* Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flagmckusick2011-07-243-20/+23
| | | | | | | | | so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)
* Default debugging error messages to off for journaled soft updates sysctls.mckusick2011-07-221-5/+3
| | | | | | Delete limiting on output of these sysctls. Approved by: re (kib)
* Add an FFS specific mount option to allow a filesystem checkermckusick2011-07-155-24/+306
| | | | | | | | | (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
* Consistently check mount flag (MNTK_SUJ) rather than superblockmckusick2011-07-141-2/+2
| | | | | | | | | | flag (FS_SUJ) when determining whether to do journaling-based operations. The mount flag is set only when journaling is active while the superblock flag is set to indicate that journaling is to be used. For example, when the filesystem is mounted read-only, the journaling may be present (FS_SUJ) but not active (MNTK_SUJ). Inappropriate checking of the FS_SUJ flag was causing some journaling actions to be attempted at inappropriate times.
* When first creating snapshots, we may free some blocks within it.mckusick2011-07-101-1/+5
| | | | | | These blocks should not have TRIM applied to them. Submitted by: Kostik Belousov
* Allow disk partitions associated with UFS read-only mountedmckusick2011-07-101-15/+7
| | | | | | | | | filesystems to be opened for writing. This functionality used to be special-cased for just the root filesystem, but with this change is now available for all UFS filesystems. This change is needed for journaled soft updates recovery. Discussed with: Jeff Roberson
* Use 'curthread_pflags' instead of 'thread_pflags' to signify that onlykib2011-07-091-12/+12
| | | | | | | curthread can be operated upon. Requested by: attilio MFC after: 1 week
* Use helper functions instead of manually managing TDP_INBDFLUSH.kib2011-07-091-16/+12
| | | | | | Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version) MFC after: 1 week
* - Speed up pendingblock processing again. Having too much delay betweenjeff2011-07-042-15/+41
| | | | | ffs_blkfree() and the pending adjustment causes all kinds of space related problems.
* - Handle D_JSEGDEP in the softdep_sync_buf() switch. These can nowjeff2011-07-041-0/+1
| | | | | | find themselves on snapshot vnodes. Reported by: pho
* - It is impossible to run request_cleanup() while doing a copyonwrite.jeff2011-07-041-25/+21
| | | | | | | | | | | | This will most likely cause new block allocations which can recurse into request cleanup. - While here optimize the ufs locking slightly. We need only acquire and drop once. - process_removes() and process_truncates() also is only needed once. - Attempt to flush each item on the worklist once but do not loop forever if some can not be completed. Discussed with: mckusick
* - Fix an inode quota leak. We need to decrement the quota once and onlyjeff2011-07-041-5/+4
| | | | | | | once. Tested by: pho Reviewed by: mckusick
* Handle the FREEDEP case in softdep_sync_buf().mckusick2011-06-291-0/+1
| | | | | | This fix failed to get added in -r223325. Submitted by: Peter Holm
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-291-1/+1
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* - Fix directory count rollbacks by passing the mode to the journal depjeff2011-06-203-47/+163
| | | | | | earlier. - Add rollback/forward code for frag and cluster accounting. - Handle the FREEDEP case in softdep_sync_buf(). (submitted by pho)
* Fixed dereference of a NULL pointer.mckusick2011-06-181-1/+2
| | | | Reported by: Peter Holm
* Drop the include of <ufs/ffs/ffs_extern.h> from usr.sbin/makefs/ffs/ffs_bswap.cmckusick2011-06-161-2/+5
| | | | | | | | | and usr.sbin/makefs/ffs/ffs_subr.c as they have no need of anything in that file. No other programs or libraries include <ufs/ffs/ffs_extern.h> (nor should they as it is totally in-kernel interfaces). For added protection I enclosed the entire contents of <ufs/ffs/ffs_extern.h> in ifdef _KERNEL. Feedback from: Bruce Evans and Tai-hwa Liang
* Fixing compilation bustage by introducing another forward declaration.avatar2011-06-161-0/+1
|
* Ensure that filesystem metadata contained within persistent snapshotsmckusick2011-06-157-42/+74
| | | | | | is always kept consistent. Suggested by: Jeff Roberson
* With the restructuring of the block reclaimation code, the notificationmckusick2011-06-153-4/+28
| | | | | | | messages for a filesystem being out of space need to be moved so that they do not print out until after a failed cleanup attempt. Suggested by: Jeff Roberson
* Missing cleanup case after completion of a snapshot vnode writemckusick2011-06-151-0/+4
| | | | | | | claiming a released block. Submitted by: Jeff Roberson Tested by: Peter Holm
* Use alternative, less messy solution to avoid breakage after r223020:dim2011-06-131-0/+2
| | | | | | put the snapdata structure between #ifdef _KERNEL guards. Suggested by: kib
* Update to soft updates journaling to properly track freed blocksmckusick2011-06-125-30/+180
| | | | | | | that get claimed by snapshots. Submitted by: Jeff Roberson Tested by: Peter Holm
* Disable the soft updates journaling after a filesystem is successfullymckusick2011-06-122-2/+11
| | | | | downgraded to read-only. It will be restarted if the filesystem is upgraded back to read-write.
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-1011-1507/+2627
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* - Add support for referencing quota structures without needing the inodejeff2011-06-102-0/+101
| | | | | | pointer for softupdates. Submitted by: mckusick
* - If the fsync in ufs_direnter fails SUJ can later panic because we havejeff2011-06-101-1/+1
| | | | | | | partially added a name. Allow ufs_direnter() to continue in the hopes that it is a transient error. If it is not, the directory is corrupted already from IO errors and writing this new block is not likely to make things worse.
* Grammer fix in comment.mckusick2011-06-051-3/+3
| | | | | | | | Eliminate one (of several) possible conflicting buffer locks when trying to reclaim blocks. Rest of fix to be incorporated as part of SUJ update by jeff. Pointed out by: Kostik Belousov
* Due to a lag in updating the fs_pendinginodes count, we cannot dependmckusick2011-05-281-1/+1
| | | | | | | on it to decide whether we should try to reclaim inodes when we run short. Discovered by: Peter Holm
* The check for whether a block is going to be claimed by a snapshotmckusick2011-05-261-4/+12
| | | | | needs to happen before we notify the underlying layer that it is being freed.
* Fix the ufs/ffs file system so that it uses the lockrmacklem2011-05-221-1/+1
| | | | | | flags argument added to VFS_FHTOVP() by r222167. Reviewed by: mckusick
* Add a lock flags argument to the VFS_FHTOVP() file systemrmacklem2011-05-223-4/+6
| | | | | | | | | | | method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+1
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* Fix typos.kib2011-04-301-2/+2
| | | | | | Noted by: Fabian Keil <freebsd-listen fabiankeil de> Pointy hat to: kib MFC after: 1 week
* Clarify the comment.kib2011-04-301-2/+4
| | | | MFC after: 1 week
* VFS sometimes is unable to inactivate a vnode when vnode use countkib2011-04-243-23/+31
| | | | | | | | | | | | | | | | | | | | | | goes to zero. E.g., the vnode might be only shared-locked at the time of vput() call. Such vnodes are kept in the hash, so they can be found later. If ffs_valloc() allocated an inode that has its vnode cached in hash, and still owing the inactivation, then vget() call from ffs_valloc() clears VI_OWEINACT, and then the vnode is reused for the newly allocated inode. The problem is, the vnode is not reclaimed before it is put to the new use. ffs_valloc() recycles vnode vm object, but this is not enough. In particular, at least v_vflag should be cleared, and several bits of UFS state need to be removed. It is very inconvenient to call vgone() at this point. Instead, move some parts of ufs_reclaim() into helper function ufs_prepare_reclaim(), and call the helper from VOP_RECLAIM and ffs_valloc(). Reviewed by: mckusick Tested by: pho MFC after: 3 weeks
* - Refactor softdep_setup_freeblocks() into a set of functions to preparejeff2011-04-111-151/+221
| | | | | | | for a new journal specific partial truncate routine. - Use dep_current[] in place of specific dependency counts. This is automatically maintained when workitems are allocated and has less risk of becoming incorrect.
* Fix a long standing SUJ performance problem:jeff2011-04-102-57/+215
| | | | | | | | | | | | | | | - Keep a hash of indirect blocks that have recently been freed and are still referenced in the journal. - Lookup blocks in this hash before forcing a new block write to wait on the journal entry to hit the disk. This is only necessary to avoid confusion between old identities as indirects and new identities as file blocks. - Don't free jseg structures until the journal has written a record that invalidates it. This keeps the indirect block information around for as long as is required to be safe. - Force an empty journal block write when required to flush out stale journal data that is simply waiting for the oldest valid sequence number to advance beyond it.
* - Don't invalidate jnewblks immediately upon discovering that the blockjeff2011-04-072-102/+233
| | | | | | | | will be removed. Permit the journal to proceed so that we don't leave a rollback in a cg for a very long time as this can cause terrible perf problems in low memory situations. Tested by: pho
* Be far more persistent in reclaiming blocks and inodes before givingmckusick2011-04-053-19/+131
| | | | | | | | | | up and declaring a filesystem out of space. Especially necessary when running on a small filesystem. With this improvement, it should be possible to use soft updates on a small root filesystem. Kudos to: Peter Holm Testing by: Peter Holm MFC: 2 weeks
* Fix problems that manifested from filesystem full conditions:jeff2011-04-021-9/+14
| | | | | | | | | | | | | | | | | | - In softdep_revert_mkdir() find the dotaddref before we attempt to cancel the jaddref so we can make assumptions about where the dotaddref is on the list. cancel_jaddref() does not always remove items from the list anymore. - Always set GOINGAWAY on an inode in softdep_freefile() if DEPCOMPLETE was never set. This ensures that dependencies will continue to be processed on the inowait/bufwait list and is more an artifact of the structure of the code than a pure ordering problem. - Always set DEPCOMPLETE on canceled jaddrefs so that they can be freed appropriately. This normally occurs when the refs are added to the journal but if they are canceled before this point the state would never be set and the dependency could never be freed. Reported by: pho Tested by: pho
* Fix the softdep_request_cleanup() function definition for !SOFTUPDATES case.kib2011-03-281-1/+2
| | | | Submitted by: Aleksandr Rybalko <ray dlink ua>
OpenPOWER on IntegriCloud