summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
...
* When a file is first being written, the dynamic block reallocationmckusick2012-11-032-2/+75
| | | | | | | | | | | | | | | | | | | | | | (implemented by ffs_reallocblks_ufs[12]) relocates the file's blocks so as to cluster them together into a contiguous set of blocks on the disk. When the cluster crosses the boundary into the first indirect block, the first indirect block is initially allocated in a position immediately following the last direct block. Block reallocation would usually destroy locality by moving the indirect block out of the way to keep the data blocks contiguous. This change compensates for this problem by noting that the first indirect block should be left immediately following the last direct block. It then tries to start a new cluster of contiguous blocks (referenced by the indirect block) immediately following the indirect block. We should also do this for other indirect block boundaries, but it is only important for the first one. Suggested by: Bruce Evans MFC: 2 weeks
* - In cancel_mkdir_dotdot don't panic if the inodedep is not available. Ifjeff2012-11-021-1/+1
| | | | | | | | | the previous diradd had already finished it could have been reclaimed already. This would only happen under heavy dependency pressure. Reported by: Andrey Zonov <zont@FreeBSD.org> Discussed with: mckusick MFC after: 1 week
* The r241025 fixed the case when a binary, executed from nullfs mount,kib2012-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
* Fix problem with geom_label(4) not recognizing UFS labels on filesystemstrasz2012-10-301-1/+2
| | | | | | | | | | | | | | | | | | extended using growfs(8). The problem here is that geom_label checks if the filesystem size recorded in UFS superblock is equal to the provider (i.e. device) size. This check cannot be removed due to backward compatibility. On the other hand, in most cases growfs(8) cannot set fs_size in the superblock to match the provider size, because, differently from newfs(8), it cannot recompute cylinder group sizes. To fix this problem, add another superblock field, fs_providersize, used only for this purpose. The geom_label(4) will attach if either fs_size (filesystem created with newfs(8)) or fs_providersize (filesystem expanded using growfs(8)) matches the device size. PR: kern/165962 Reviewed by: mckusick Sponsored by: FreeBSD Foundation
* Fix two problems that caused instant panic when the device mountedtrasz2012-10-281-2/+7
| | | | | | | | with softupdates went away. Note that this does not fix the problem entirely; I'm committing it now to make it easier for someone to pick up the work. Reviewed by: mckusick
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-223-42/+6
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix up kernel sources to be ready for a 64-bit ino_t.mdf2012-09-277-53/+63
| | | | Original code by: Gleb Kurtsou
* Remove unused member of struct indir (in_exists) from UFS and EXT2 code.mjg2012-08-172-4/+0
| | | | | | Reviewed by: mckusick Approved by: trasz (mentor) MFC after: 1 week
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Use NULL instead of 0 for pointerskevlo2012-07-222-4/+4
|
* Extend the KPI to lock and unlock f_offset member of struct file. Itkib2012-07-021-5/+2
| | | | | | | | | | | | | | | | | | now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks
* Fix unbounded-length malloc, controlled from usermode. The added checkkib2012-06-211-3/+7
| | | | | | | | | | | | | | is performed before exact size of the buffer is calculated, but the buffer cannot have size greater then the total space allocated for extended attributes. The existing check is executing with precise size, but it is too late, since buffer needs to be allocated in advance. Also, adapt to uio_resid being of ssize_t type. Use lblktosize instead of multiplying by fs block size by hand as well. Reported and tested by: pho MFC after: 1 week
* In softdep_setup_inomapdep() we may have to allocate both inodedepmckusick2012-06-111-14/+41
| | | | | | | | | | | | | | | | | | | | | | | and bmsafemap dependency structures in inodedep_lookup() and bmsafemap_lookup() respectively. The setup of these structures must be done while holding the soft-dependency mutex. If the inodedep is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the bmsafemap. If the bmsafemap is allocated first, it may be freed in the I/O completion callback when the mutex is released to allocate the inodedep. To resolve this problem, bmsafemap_lookup has had a parameter added that allows a pre-malloc'ed bmsafemap to be passed in so that it does not need to release the mutex to create a new bmsafemap. The softdep_setup_inomapdep() routine pre-malloc's a bmsafemap dependency before acquiring the mutex and starting to build the inodedep with a call to inodedep_lookup(). The subsequent call to bmsafemap_lookup() is passed this pre-allocated bmsafemap entry so that it need not release the mutex if it needs to create a new one. Reported by: Peter Holm Tested by: Peter Holm MFC after: 1 week
* Enable vn_io_fault() lock avoidance for UFS.kib2012-05-302-4/+4
| | | | | Tested by: pho MFC after: 2 months
* Implement SEEK_HOLE/SEEK_DATA for UFS.kib2012-05-261-0/+20
| | | | MFC after: 2 weeks
* Add missing `continue' statement at end of case.mckusick2012-05-181-0/+1
| | | | | Found by: Kevin Lo (kevlo@) MFC after: 1 week
* Remove unused thread argument from ↵trasz2012-04-231-21/+21
| | | | ufs_extattr_uepm_lock()/ufs_extattr_uepm_unlock().
* Fix build.trasz2012-04-231-1/+1
|
* Remove unused thread argument from clear_inodeps() and clear_remove().trasz2012-04-231-11/+8
|
* Remove unused thread argument to vrecycle().trasz2012-04-231-2/+1
| | | | Reviewed by: kib
* Remove unused thread argument from vtruncbuf().trasz2012-04-238-18/+14
| | | | Reviewed by: kib
* Fix use-after-free introduced in r234036.trasz2012-04-211-1/+5
| | | | | Reviewed by: mckusick Tested by: pho
* This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loopsmckusick2012-04-204-5/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* The part about exec atime no longer applies in the comment.jh2012-04-181-3/+2
| | | | Pointed out by: bde
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-174-106/+36
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Export vinactive() from kern/vfs_subr.c (e.g., make it no longermckusick2012-04-111-12/+1
| | | | | | | | | | static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks
* - Return EPERM from ufs_setattr() when an user without PRIV_VFS_SYSFLAGSjh2012-04-101-11/+5
| | | | | | | | | privilege attempts to toggle SF_SETTABLE flags. - Use the '^' operator in the SF_SNAPSHOT anti-toggling check. Flags are now stored to ip->i_flags in one place after all checks. Submitted by: bde
* Fix panic in ffs_reload(), which may happen when read-only filesystemtrasz2012-04-081-2/+8
| | | | | | | gets resized and then reloaded. Reviewed by: kib, mckusick (earlier version) Sponsored by: The FreeBSD Foundation
* Drop an unnecessary setting of si_mountpt when updating a UFS mount point.mckusick2012-04-081-2/+0
| | | | | | Clearly it must have been set when the mount was done. Reviewed by: kib
* Add a check for unsupported file flags to ufs_setattr().jh2012-04-041-0/+4
| | | | | Discussed with: bde MFC after: 2 weeks
* A file cannot be deallocated until its last name has been removedmckusick2012-04-022-52/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and it is no longer referenced by a user process. The inode for a file whose name has been removed, but is still referenced at the time of a crash will still be allocated in the filesystem, but will have no references (e.g., they will have no names referencing them from any directory). With traditional soft updates these unreferenced inodes will be found and reclaimed when the background fsck is run. When using journaled soft updates, the kernel must keep track of these inodes so that it can find and reclaim them during the cleanup process. Their existence cannot be stored in the journal as the journal only handles short-term events, and they may persist for days. So, they are tracked by keeping them in a linked list whose head pointer is stored in the superblock. The journal tracks them only until their linked list pointers have been commited to disk. Part of the cleanup process involves traversing the list of unreferenced inodes and reclaiming them. This bug was triggered when confusion arose in the commit steps of keeping the unreferenced-inode linked list coherent on disk. Notably, a race between the link() system call adding a link-count to a file and the unlink() system call removing a link-count to the file. Here if the unlink() ran after link() had looked up the file but before link() had incremented the link-count of the file, the file's link-count would drop to zero before the link() incremented it back up to one. If the file was referenced by a user process, the first transition through zero made it appear that it should be added to the unreferenced-inode list when in fact it should not have been added. If the new name created by link() was deleted within a few seconds (with the file still referenced by a user process) it would legitimately be a candidate for addition to the unreferenced-inode list. The result was that there were two attempts to add the same inode to the unreferenced-inode list which scrambled the unreferenced-inode list's pointers leading to a panic. The fix is to detect and avoid the false attempt at adding it to the unreferenced-inode list by having the link() system call check to see if the link count is zero before it increments it. If it is, the link() fails with ENOENT (showing that it has failed the link()/unlink() race). While tracking down this bug, we have added additional assertions to detect the problem sooner and also simplified some of the code. Reported by: Kirk Russell Fix submitted by: Jeff Roberson Tested by: Peter Holm PR: kern/159971 MFC (to 9 only): 2 weeks
* - Use more natural ip->i_flags instead of vap->va_flags in the finaljh2012-04-021-5/+11
| | | | | | | | | | | | flags check. - Add a comment for the immutable/append check done after handling of the flags. - Style improvements. No functional change intended. Submitted by: bde MFC after: 2 weeks
* A refinement of change 232351 to avoid a race with a forcible unmount.mckusick2012-03-281-4/+19
| | | | | | | | | | | | While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately. This change will be included along with 232351 when it is MFC'ed to 9. Spotted by: kib Reviewed by: kib
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+6
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Do trivial reformatting of the comment to record the missed commitkib2012-03-281-4/+3
| | | | | | | message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky
* Reviewed by: bde, mckusickkib2012-03-281-11/+73
| | | | | Tested by: pho MFC after: 2 weeks
* Microoptimize: in qsync loop over mount vnodes, only unlock mountkib2012-03-281-2/+1
| | | | | | | | | interlock after we committed to try to vget() the vnode. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 1 week
* Update comment.kib2012-03-281-1/+1
| | | | MFC after: 3 days
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-258-43/+40
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Supply boolean as the second argument to ffs_update(), and not akib2012-03-132-7/+7
| | | | | | | | MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks
* Remove superfluous brackets.kib2012-03-111-1/+1
| | | | | Submitted by: alc MFC after: 2 weeks
* Do schedule delayed writes for async mounts.kib2012-03-111-7/+11
| | | | | | | | | | While there, make some style adjustments, like missed () around return values. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Do not fall back to slow synchronous i/o when low on memory or buffers.kib2012-03-111-2/+4
| | | | | | | | | | The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* In ffs_syncvnode(), pass boolean false as second argument of ffs_update().kib2012-03-111-1/+1
| | | | | | | | | | | Synchronous inode block update is not needed for MNT_LAZY callers (syncer), and since waitfor values are not zero, code did unneccessary synchronous update. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Remove not needed ARGSUSED lint command.kib2012-03-111-1/+0
| | | | | Submitted by: bde MFC after: 3 days
* Remove fifo.h. The only used function declaration from the header iskib2012-03-111-2/+0
| | | | | | migrated to sys/vnode.h. Submitted by: gianni
* Revert r232692 as the correct place to fix this is at the syscall level.pho2012-03-091-2/+2
|
* Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag whichkib2012-03-091-2/+1
| | | | | | allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
* Add KTR_VFS traces to track modifications to a vnode's writecount.jhb2012-03-081-0/+2
|
* syscall() fuzzing can trigger this panic. Return EINVAL instead.pho2012-03-081-2/+2
| | | | MFC after: 1 week
OpenPOWER on IntegriCloud