summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Remove unused thread argument from ↵trasz2012-04-231-21/+21
| | | | ufs_extattr_uepm_lock()/ufs_extattr_uepm_unlock().
* Fix build.trasz2012-04-231-1/+1
|
* Remove unused thread argument from clear_inodeps() and clear_remove().trasz2012-04-231-11/+8
|
* Remove unused thread argument to vrecycle().trasz2012-04-231-2/+1
| | | | Reviewed by: kib
* Remove unused thread argument from vtruncbuf().trasz2012-04-238-18/+14
| | | | Reviewed by: kib
* Fix use-after-free introduced in r234036.trasz2012-04-211-1/+5
| | | | | Reviewed by: mckusick Tested by: pho
* This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loopsmckusick2012-04-204-5/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* The part about exec atime no longer applies in the comment.jh2012-04-181-3/+2
| | | | Pointed out by: bde
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-174-106/+36
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Export vinactive() from kern/vfs_subr.c (e.g., make it no longermckusick2012-04-111-12/+1
| | | | | | | | | | static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks
* - Return EPERM from ufs_setattr() when an user without PRIV_VFS_SYSFLAGSjh2012-04-101-11/+5
| | | | | | | | | privilege attempts to toggle SF_SETTABLE flags. - Use the '^' operator in the SF_SNAPSHOT anti-toggling check. Flags are now stored to ip->i_flags in one place after all checks. Submitted by: bde
* Fix panic in ffs_reload(), which may happen when read-only filesystemtrasz2012-04-081-2/+8
| | | | | | | gets resized and then reloaded. Reviewed by: kib, mckusick (earlier version) Sponsored by: The FreeBSD Foundation
* Drop an unnecessary setting of si_mountpt when updating a UFS mount point.mckusick2012-04-081-2/+0
| | | | | | Clearly it must have been set when the mount was done. Reviewed by: kib
* Add a check for unsupported file flags to ufs_setattr().jh2012-04-041-0/+4
| | | | | Discussed with: bde MFC after: 2 weeks
* A file cannot be deallocated until its last name has been removedmckusick2012-04-022-52/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and it is no longer referenced by a user process. The inode for a file whose name has been removed, but is still referenced at the time of a crash will still be allocated in the filesystem, but will have no references (e.g., they will have no names referencing them from any directory). With traditional soft updates these unreferenced inodes will be found and reclaimed when the background fsck is run. When using journaled soft updates, the kernel must keep track of these inodes so that it can find and reclaim them during the cleanup process. Their existence cannot be stored in the journal as the journal only handles short-term events, and they may persist for days. So, they are tracked by keeping them in a linked list whose head pointer is stored in the superblock. The journal tracks them only until their linked list pointers have been commited to disk. Part of the cleanup process involves traversing the list of unreferenced inodes and reclaiming them. This bug was triggered when confusion arose in the commit steps of keeping the unreferenced-inode linked list coherent on disk. Notably, a race between the link() system call adding a link-count to a file and the unlink() system call removing a link-count to the file. Here if the unlink() ran after link() had looked up the file but before link() had incremented the link-count of the file, the file's link-count would drop to zero before the link() incremented it back up to one. If the file was referenced by a user process, the first transition through zero made it appear that it should be added to the unreferenced-inode list when in fact it should not have been added. If the new name created by link() was deleted within a few seconds (with the file still referenced by a user process) it would legitimately be a candidate for addition to the unreferenced-inode list. The result was that there were two attempts to add the same inode to the unreferenced-inode list which scrambled the unreferenced-inode list's pointers leading to a panic. The fix is to detect and avoid the false attempt at adding it to the unreferenced-inode list by having the link() system call check to see if the link count is zero before it increments it. If it is, the link() fails with ENOENT (showing that it has failed the link()/unlink() race). While tracking down this bug, we have added additional assertions to detect the problem sooner and also simplified some of the code. Reported by: Kirk Russell Fix submitted by: Jeff Roberson Tested by: Peter Holm PR: kern/159971 MFC (to 9 only): 2 weeks
* - Use more natural ip->i_flags instead of vap->va_flags in the finaljh2012-04-021-5/+11
| | | | | | | | | | | | flags check. - Add a comment for the immutable/append check done after handling of the flags. - Style improvements. No functional change intended. Submitted by: bde MFC after: 2 weeks
* A refinement of change 232351 to avoid a race with a forcible unmount.mckusick2012-03-281-4/+19
| | | | | | | | | | | | While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately. This change will be included along with 232351 when it is MFC'ed to 9. Spotted by: kib Reviewed by: kib
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+6
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Do trivial reformatting of the comment to record the missed commitkib2012-03-281-4/+3
| | | | | | | message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky
* Reviewed by: bde, mckusickkib2012-03-281-11/+73
| | | | | Tested by: pho MFC after: 2 weeks
* Microoptimize: in qsync loop over mount vnodes, only unlock mountkib2012-03-281-2/+1
| | | | | | | | | interlock after we committed to try to vget() the vnode. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 1 week
* Update comment.kib2012-03-281-1/+1
| | | | MFC after: 3 days
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-258-43/+40
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Supply boolean as the second argument to ffs_update(), and not akib2012-03-132-7/+7
| | | | | | | | MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks
* Remove superfluous brackets.kib2012-03-111-1/+1
| | | | | Submitted by: alc MFC after: 2 weeks
* Do schedule delayed writes for async mounts.kib2012-03-111-7/+11
| | | | | | | | | | While there, make some style adjustments, like missed () around return values. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Do not fall back to slow synchronous i/o when low on memory or buffers.kib2012-03-111-2/+4
| | | | | | | | | | The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* In ffs_syncvnode(), pass boolean false as second argument of ffs_update().kib2012-03-111-1/+1
| | | | | | | | | | | Synchronous inode block update is not needed for MNT_LAZY callers (syncer), and since waitfor values are not zero, code did unneccessary synchronous update. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Remove not needed ARGSUSED lint command.kib2012-03-111-1/+0
| | | | | Submitted by: bde MFC after: 3 days
* Remove fifo.h. The only used function declaration from the header iskib2012-03-111-2/+0
| | | | | | migrated to sys/vnode.h. Submitted by: gianni
* Revert r232692 as the correct place to fix this is at the syscall level.pho2012-03-091-2/+2
|
* Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag whichkib2012-03-091-2/+1
| | | | | | allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
* Add KTR_VFS traces to track modifications to a vnode's writecount.jhb2012-03-081-0/+2
|
* syscall() fuzzing can trigger this panic. Return EINVAL instead.pho2012-03-081-2/+2
| | | | MFC after: 1 week
* Similar to the fixes in 226967 and 226987, purge any name cache entriesjhb2012-03-021-0/+7
| | | | | | | | | associated with the previous vnode (if any) associated with the target of a rename(). Otherwise, a lookup of the target pathname concurrent with a rename() could re-add a name cache entry after the namei(RENAME) lookup in kern_renameat() had purged the target pathname. MFC after: 2 weeks
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-016-75/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Properly lock DQREF() with dqhlock. Missed locking caused counterkib2012-02-221-0/+4
| | | | | | | | | corruption. Assert that the dq reference value is sane before decrementing it. Reported and tested by: pho MFC after: 1 week
* Fix found places where uio_resid is truncated to int.kib2012-02-212-5/+10
| | | | | | | | | Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
* Missing conditions in checking whether an inode has been written.mckusick2012-02-131-0/+3
| | | | | Found and tested by: Peter Holm MFC after: 2 weeks (to 9 only)
* Historically when an application wrote an entire block of a file,mckusick2012-02-091-9/+20
| | | | | | | | | | | | | | | | | the kernel allocated a buffer but did not zero it as it was about to be completely filled by a uiomove() from the user's buffer. However, if the uiomove() failed, the old contents of the buffer could be exposed especially if the file was being mmap'ed. The fix was to always zero the buffer when it was allocated. This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. Reviewed by: kib Tested by: scottl MFC after: 2 weeks (to 9 only)
* In the original days of BSD, a sync was issued on every filesystemmckusick2012-02-071-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks
* Sprinkle missed calls to asynchronous UFS_UPDATE() in attempt tokib2012-02-072-4/+16
| | | | | | | | | | | guarantee that all UFS inode metadata changes results in the dirtiness of the inodeblock. Due to missed inodeblock updates, syncer was required to fsync each mount point' vnode to guarantee periodic metadata flush. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* Add missing opt_quota.h include to activate #ifdef QUOTA blocks,kib2012-02-061-1/+2
| | | | | | | apparently a step in unbreaking QUOTA support. Reported and tested by: Adam Strohl <adams-freebsd ateamsystems com> MFC after: 1 week
* JNEWBLK dependency may legitimately appear on the buf dependencykib2012-02-061-0/+1
| | | | | | | | | list. If softdep_sync_buf() discovers such dependency, it should do nothing, which is safe as it is only waiting on the parent buffer to be written, so it can be removed. Committed on behalf of: jeff MFC after: 1 week
* Current implementations of sync(2) and syncer vnode fsync() VOP useskib2012-02-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* There are several bugs/hangs when trying to take a snapshot on a UFS/FFSmckusick2012-01-171-1/+9
| | | | | | | | filesystem running with journaled soft updates. Until these problems have been tracked down, return ENOTSUPP when an attempt is made to take a snapshot on a filesystem running with journaled soft updates. MFC after: 2 weeks
* Make sure all intermediate variables holding mount flags (mnt_flag)mckusick2012-01-171-2/+2
| | | | | | | and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
* Add a bit of verbosity to the comment.ivoras2012-01-161-1/+6
|
* Convert FFS mount error messages from kernel printf's to using themckusick2012-01-141-61/+65
| | | | | | | | | | | vfs_mount_error error message facility provided by the nmount interface. Clean up formatting of mount warnings which still need to use kernel printf's since they do not return errors. Requested by: Craig Rodrigues <rodrigc@crodrigues.org> MFC after: 2 weeks
* Avoid LOR between vfs_busy() lock and covered vnode lock on quotaon().kib2012-01-081-3/+18
| | | | | | | | | | | | | The vfs_busy() is after covered vnode lock in the global lock order, but since quotaon() does recursive VFS call to open quota file, we usually end up locking covered vnode after mp is busied in sys_quotactl(). Change the interface of VFS_QUOTACTL(), requiring that mp was unbusied by fs code, and do not try to pick up vfs_busy() reference in ufs quotaon, esp. if vfs_busy cannot succeed due to unmount being performed. Reported and tested by: pho MFC after: 1 week
OpenPOWER on IntegriCloud