summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ffs/ffs_vnops.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r283968:kib2015-06-101-2/+2
| | | | | | Syncing a directory vnode might drop the vnode lock in the softdep_sync() similarly to the regular vnode sync. Allow retry for both vnode types.
* MFC r283735:kib2015-06-051-14/+0
| | | | Remove several write-only variables.
* MFC r272952:kib2014-10-181-1/+1
| | | | Do not set IN_ACCESS flag for read-only mounts.
* MFC r262814scottl2014-04-151-2/+10
| | | | | | | | | | | - If we fail to do a non-blocking acquire of a buf lock while doing a waiting sync pass we need to do a blocking acquire and restart. Another thread, typically the buf daemon, may have this buf locked and if we don't wait we can fail to sync the file. This lead to a great variety of softdep panics because we rely on all dependencies being flushed before proceeding in several cases. Submitted by: jeffr
* MFC r262678;pfg2014-03-051-3/+3
| | | | | | | | | | ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. Reviewed by: mckusick
* MFC r256448, r257029;pfg2013-12-111-3/+4
| | | | | | | | | | | | | Make di_blocks unsigned in UFS1 as is the case already for UFS2. Most of the code between UFS1 and UFS2 is shared so this change is pretty safe. Not only this makes UFS1 and 2 consistent but it also matches what NetBSD and MacOS X have for some years now. UFS2: make di_extsize unsigned. di_extsize is the EA size and as such it should be unsigned. Adjust related types for consistency. Reviewed by: mckusick
* UFS support of the unmapped i/o for the user data buffers.kib2013-03-191-10/+23
| | | | | Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf
* Add currently unused flag argument to the cluster_read(),kib2013-03-141-2/+4
| | | | | | | | cluster_write() and cluster_wbuild() functions. The flags to be allowed are a subset of the GB_* flags for getblk(). Sponsored by: The FreeBSD Foundation Tested by: pho
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-3/+3
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Fix unbounded-length malloc, controlled from usermode. The added checkkib2012-06-211-3/+7
| | | | | | | | | | | | | | is performed before exact size of the buffer is calculated, but the buffer cannot have size greater then the total space allocated for extended attributes. The existing check is executing with precise size, but it is too late, since buffer needs to be allocated in advance. Also, adapt to uio_resid being of ssize_t type. Use lblktosize instead of multiplying by fs block size by hand as well. Reported and tested by: pho MFC after: 1 week
* Enable vn_io_fault() lock avoidance for UFS.kib2012-05-301-3/+3
| | | | | Tested by: pho MFC after: 2 months
* Remove unused thread argument from vtruncbuf().trasz2012-04-231-4/+3
| | | | Reviewed by: kib
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-251-8/+6
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Supply boolean as the second argument to ffs_update(), and not akib2012-03-131-2/+2
| | | | | | | | MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks
* In ffs_syncvnode(), pass boolean false as second argument of ffs_update().kib2012-03-111-1/+1
| | | | | | | | | | | Synchronous inode block update is not needed for MNT_LAZY callers (syncer), and since waitfor values are not zero, code did unneccessary synchronous update. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Remove not needed ARGSUSED lint command.kib2012-03-111-1/+0
| | | | | Submitted by: bde MFC after: 3 days
* Revert r232692 as the correct place to fix this is at the syscall level.pho2012-03-091-2/+2
|
* syscall() fuzzing can trigger this panic. Return EINVAL instead.pho2012-03-081-2/+2
| | | | MFC after: 1 week
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-011-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Fix found places where uio_resid is truncated to int.kib2012-02-211-4/+8
| | | | | | | | | Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
* Historically when an application wrote an entire block of a file,mckusick2012-02-091-9/+20
| | | | | | | | | | | | | | | | | the kernel allocated a buffer but did not zero it as it was about to be completely filled by a uiomove() from the user's buffer. However, if the uiomove() failed, the old contents of the buffer could be exposed especially if the file was being mmap'ed. The fix was to always zero the buffer when it was allocated. This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. Reviewed by: kib Tested by: scottl MFC after: 2 weeks (to 9 only)
* Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEPmckusick2011-07-301-2/+1
| | | | | | | is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-101-97/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* Fix typos.kib2011-04-301-2/+2
| | | | | | Noted by: Fabian Keil <freebsd-listen fabiankeil de> Pointy hat to: kib MFC after: 1 week
* Clarify the comment.kib2011-04-301-2/+4
| | | | MFC after: 1 week
* - Handle the truncation of an inode with an effective link count of 0 injeff2010-07-061-3/+0
| | | | | | | | | | | | | | | the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed. Discussed with: mckusick
* Eliminate page queues locking around most calls to vm_page_free().alc2010-05-061-2/+0
|
* Acquire the page lock around all remaining calls to vm_page_free() onalc2010-05-051-2/+4
| | | | | | | | | | | | | managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
* Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().trasz2010-05-051-15/+2
| | | | Reviewed by: kib
* - Merge soft-updates journaling from projects/suj/head into head. Thisjeff2010-04-241-0/+1
| | | | | | | | brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
* Don't panic on attempt to set ACL on a block device file.trasz2009-07-011-6/+6
| | | | | | | | | This is just a part of kern/125613. PR: kern/125613 Submitted by: Jaakko Heinonen <jh at saunalahti dot fi> Reviewed by: rwatson Approved by: re (kib)
* For SU mounts, softdep_fsync() might drop vnode lock, allowing otherkib2009-06-301-4/+25
| | | | | | | | | | threads to put dirty buffers on the vnode bufobj list. For regular files and synchronous fsync requests, check for the condition and restart the fsync vop if a new dirty buffer arrived. Tested by: pho Approved by: re (kensmith) MFC after: 1 month
* Correct typo.kib2009-03-271-2/+2
| | | | Noted by: kensmith
* The non-modifying EA VOPs are executed with only shared vnode lock taken.kib2009-03-121-63/+90
| | | | | | | | | | | | | Provide a custom lock around initializing and tearing down EA area, to prevent both memory leaks and double-free of it. Count the number of EA area accessors. Lock protocol requires either holding exclusive vnode lock to modify i_ea_area, or shared vnode lock and owning IN_EA_LOCKED flag in i_flag. Noted by: YAMAMOTO, Taku <taku tackymt homeip net> Tested by: pho (previous version) MFC after: 2 weeks
* Following a fair amount of real world experience with ACLs andrwatson2009-01-271-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | extended attributes since FreeBSD 5, make the following semantic changes: - Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried. This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed. Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS. MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
* When extending inode size, we call vnode_pager_setsize(), to have akib2009-01-201-1/+3
| | | | | | | | | | | | | | address space where to put vnode pages, and then call UFS_BALLOC(), to actually allocate new block and map it. When UFS_BALLOC() returns error, sometimes we forget to revert the vm object size increase, allowing for the pages that are not backed by the logical disk blocks. Revert vnode_pager_setsize() back when UFS_BALLOC() failed, for ffs_truncate() and ffs_write(). PR: 129956 Reviewed by: ups MFC after: 3 weeks
* Assert that v_holdcnt is non-zero before entering lockmgr in vn_lockkib2008-10-201-0/+4
| | | | | | | | | and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks
* When calling extattr_check_cred, use V{READ,WRITE}, not I{READ,WRITE}.trasz2008-09-031-4/+4
| | | | Approved by: rwatson (mentor)
* - Since rev 1.142 of ffs_snapshot.c the interlock has not been requiredjeff2008-03-311-11/+4
| | | | | | | | | | | | | to protect the v_lock pointer. Removing the interlock acquisition here allows vn_lock() to proceed without requiring the interlock at all. - If the lock mutated while we were sleeping on it the interlock has been dropped. It is conceivable that the upper layer code was relying on the interlock and LK_NOWAIT to protect the identity or state of the vnode while acquiring the lock. In this case return EBUSY rather than trying the new lock to prevent potential races. Reviewed by: tegge
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-12/+14
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* Reduce the acquisition of the vnode interlock in the ffs_read() andkib2008-03-211-2/+4
| | | | | | | | ffs_extread() when setting the IN_ACCESS flag by checking whether the IN_ACCESS is already set. The possible race there is admissible. Tested by: pho Submitted by: jeff
* Minor typo nit.keramida2008-02-251-1/+1
|
* - Introduce lockmgr_args() in the lockmgr space. This function performsattilio2008-02-151-3/+5
| | | | | | | | | | | the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>
* Cleanup lockmgr interface and exported KPI:attilio2008-01-241-3/+3
| | | | | | | | | | | | | | | | | | | | - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-2/+4
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* style(9)obrien2008-01-021-17/+17
|
* Turn most ffs 'DIAGNOSTIC's into INVARIANTS.obrien2007-11-081-5/+5
|
* Perform range check before allocating memory when readingrodrigc2007-07-131-0/+4
| | | | | | | | extended attributes. Reviewed by: kib Approved by: re (hrs) PR: 114389
OpenPOWER on IntegriCloud