summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ffs/ffs_inode.c
Commit message (Collapse)AuthorAgeFilesLines
* - Convert the bufobj lock to rwlock.jeff2013-05-311-0/+1
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* For UFS2 i_blocks is unsigned. The current "sanity" check that itmckusick2013-02-031-3/+3
| | | | | | | | | | | has gone below zero after the blocks in its inode are freed is a no-op which the compiler fails to warn about because of the use of the DIP macro. Change the sanity check to compare the number of blocks being freed against the value i_blocks. If the number of blocks being freed exceeds i_blocks, just set i_blocks to zero. Reported by: Pedro Giffuni (pfg@) MFC after: 2 weeks
* Remove unused thread argument from vtruncbuf().trasz2012-04-231-3/+2
| | | | Reviewed by: kib
* A refinement of change 232351 to avoid a race with a forcible unmount.mckusick2012-03-281-4/+19
| | | | | | | | | | | | While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately. This change will be included along with 232351 when it is MFC'ed to 9. Spotted by: kib Reviewed by: kib
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-251-3/+3
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Remove superfluous brackets.kib2012-03-111-1/+1
| | | | | Submitted by: alc MFC after: 2 weeks
* Do schedule delayed writes for async mounts.kib2012-03-111-7/+11
| | | | | | | | | | While there, make some style adjustments, like missed () around return values. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Do not fall back to slow synchronous i/o when low on memory or buffers.kib2012-03-111-2/+4
| | | | | | | | | | The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-011-12/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Generalize ffs_pages_remove() into vn_pages_remove().mm2011-08-251-13/+1
| | | | | | | | | | | Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
* Add an FFS specific mount option to allow a filesystem checkermckusick2011-07-151-1/+1
| | | | | | | | | (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-291-1/+1
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* Ensure that filesystem metadata contained within persistent snapshotsmckusick2011-06-151-5/+7
| | | | | | is always kept consistent. Suggested by: Jeff Roberson
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-101-74/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* Add function lbn_offset to calculate offset of the indirect block ofkib2010-11-111-3/+1
| | | | | | | given level. Reviewed by: jeff Tested by: pho
* - Handle the truncation of an inode with an effective link count of 0 injeff2010-07-061-4/+2
| | | | | | | | | | | | | | | the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed. Discussed with: mckusick
* - Merge soft-updates journaling from projects/suj/head into head. Thisjeff2010-04-241-55/+77
| | | | | | | | brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
* Following a fair amount of real world experience with ACLs andrwatson2009-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | extended attributes since FreeBSD 5, make the following semantic changes: - Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried. This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed. Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS. MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
* The r187467 should remove all pages for V_NORMAL case too, becausekib2009-01-201-8/+17
| | | | | | | | | | | indirect block pages are not removed by the mentioned invocation of the vnode_pager_setsize(). Put a common code into the helper function ffs_pages_remove(). Reported and tested by: dchagin Reviewed by: ups MFC after: 3 weeks
* When extending inode size, we call vnode_pager_setsize(), to have akib2009-01-201-1/+3
| | | | | | | | | | | | | | address space where to put vnode pages, and then call UFS_BALLOC(), to actually allocate new block and map it. When UFS_BALLOC() returns error, sometimes we forget to revert the vm object size increase, allowing for the pages that are not backed by the logical disk blocks. Revert vnode_pager_setsize() back when UFS_BALLOC() failed, for ffs_truncate() and ffs_write(). PR: 129956 Reviewed by: ups MFC after: 3 weeks
* FFS puts the extended attributes blocks at the negative blocks for thekib2009-01-201-0/+9
| | | | | | | | | | | | | | | | | | vnode, from -1 down. When vinvalbuf(vp, V_ALT) is done for the vnode, it incorrectly does vm_object_page_remove(0, 0), removing all pages from the underlying vm object, not only the pages that back the extended attributes data. Change vinvalbuf() to not remove any pages from the object when V_NORMAL or V_ALT are specified. Instead, the only in-tree caller in ffs_inode.c:ffs_truncate() that specifies V_ALT explicitely removes the corresponding page range. The V_NORMAL caller does vnode_pager_setsize(vp, 0) immediately after the call to vinvalbuf(V_NORMAL) already. Reported by: csjp Reviewed by: ups MFC after: 3 weeks
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-2/+2
| | | | MFC after: 3 months
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-101-2/+2
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* When downgrading the read-write mount to read-only, do_unmount() setskib2008-09-161-0/+8
| | | | | | | | | | | | | | | MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro. Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-4/+5
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* Turn most ffs 'DIAGNOSTIC's into INVARIANTS.obrien2007-11-081-3/+3
|
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-1/+1
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Do not translate the IN_ACCESS inode flag into the IN_MODIFIED while filesystemkib2006-10-101-5/+7
| | | | | | | | | | | | | | | is suspending/suspended. Doing so may result in deadlock. Instead, set the (new) IN_LAZYACCESS flag, that becomes IN_MODIFIED when suspend is lifted. Change the locking protocol in order to set the IN_ACCESS and timestamps without upgrading shared vnode lock to exclusive (see comments in the inode.h). Before that, inode was modified while holding only shared lock. Tested by: Peter Holm Reviewed by: tegge, bde Approved by: pjd (mentor) MFC after: 3 weeks
* - Consistently call 'vp' vp rather than ovp sometimes in ffs_truncate().jeff2005-04-051-105/+104
| | | | | | Do the same for oip. Pointed out by: glebius
* - Fix an assert now that the XLOCK no longer exists.jeff2005-03-131-4/+1
| | | | Sponsored by: Isilon Systems, Inc.
* - Add VOP locking asserts in several functions that have been implicated injeff2005-02-221-0/+3
| | | | recent deadlocks.
* - In the softupdates case for ffs_truncate() we use vinvalbuf() tojeff2005-02-091-0/+1
| | | | | | | | | | | invalidate pending io and dependencies. However, vinvalbuf() rightfully does not call vnode_pager_setsize() for us. We must do this here. This could potentially have caused numerous kinds of bugs, but it was specifically causing msync() deadlocks because msync() was writing flushing pages that should not have been valid. Sponsored by: Isilon Systems, Inc. Reported by: kkenn
* Don't use the UFS_* and VFS_* functions where a direct call is possble.phk2005-02-081-4/+4
| | | | | The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.
* For snapshots we need all VOP_LOCKs to be exclusive.phk2005-02-081-3/+3
| | | | | | | | | | | The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.
* - Acquire the ufs lock when manipulating some fields of struct fs.jeff2005-01-241-7/+13
| | | | | | | - Change arguments to various ffs functions to match their new prototypes. Sponsored By: Isilon Systems, Inc.
* Eliminate unused and unnecessary "cred" argument from vinvalbuf()phk2005-01-141-3/+2
|
* Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().phk2005-01-111-3/+3
| | | | | | | | | | | | | | | | | | I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Loose the v_dirty* and v_clean* alias macros.phk2004-10-251-2/+2
| | | | | Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
* Move the buffer method vector (buf->b_op) to the bufobj.phk2004-10-241-1/+1
| | | | | | | | | | | | | | | | | Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
* Avoid using casts as lvalues. Introduce DIP_SET macro which sets properkan2004-07-281-16/+19
| | | | | inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | | | license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson
* DuH!phk2003-10-181-1/+1
| | | | | bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)
* Initialize bp->b_offset before calling VOP_[SPEC]STRATEGY()phk2003-10-181-0/+1
|
* - In ffs_update() assert that either the vnode lock or the XLOCK is held.jeff2003-10-051-0/+4
|
* Use __FBSDID().obrien2003-06-111-1/+3
|
* - Add a new 'flags' parameter to getblk().jeff2003-03-041-1/+1
| | | | | | | | | | - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick
* Back out M_* changes, per decision of the TRB.imp2003-02-191-1/+1
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-1/+1
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op sincephk2003-01-031-1/+1
| | | | all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
OpenPOWER on IntegriCloud