summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* A refinement of change 232351 to avoid a race with a forcible unmount.mckusick2012-03-281-4/+19
| | | | | | | | | | | | While we have a snapshot vnode unlocked to avoid a deadlock with another inode in the same inode block being updated, the filesystem containing it may be forcibly unmounted. When that happens the snapshot vnode is revoked. We need to check for that condition and fail appropriately. This change will be included along with 232351 when it is MFC'ed to 9. Spotted by: kib Reviewed by: kib
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+6
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Do trivial reformatting of the comment to record the missed commitkib2012-03-281-4/+3
| | | | | | | message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky
* Reviewed by: bde, mckusickkib2012-03-281-11/+73
| | | | | Tested by: pho MFC after: 2 weeks
* Microoptimize: in qsync loop over mount vnodes, only unlock mountkib2012-03-281-2/+1
| | | | | | | | | interlock after we committed to try to vget() the vnode. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 1 week
* Update comment.kib2012-03-281-1/+1
| | | | MFC after: 3 days
* Add a third flags argument to ffs_syncvnode to avoid a possible conflictmckusick2012-03-258-43/+40
| | | | | | | with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
* Supply boolean as the second argument to ffs_update(), and not akib2012-03-132-7/+7
| | | | | | | | MNT_[NO]WAIT constants, which in fact always caused sync operation. Based on the submission by: bde Reviewed by: mckusick MFC after: 2 weeks
* Remove superfluous brackets.kib2012-03-111-1/+1
| | | | | Submitted by: alc MFC after: 2 weeks
* Do schedule delayed writes for async mounts.kib2012-03-111-7/+11
| | | | | | | | | | While there, make some style adjustments, like missed () around return values. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Do not fall back to slow synchronous i/o when low on memory or buffers.kib2012-03-111-2/+4
| | | | | | | | | | The bawrite() schedules the write to happen immediately, and its use frees the current thread to do more cleanups. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* In ffs_syncvnode(), pass boolean false as second argument of ffs_update().kib2012-03-111-1/+1
| | | | | | | | | | | Synchronous inode block update is not needed for MNT_LAZY callers (syncer), and since waitfor values are not zero, code did unneccessary synchronous update. Submitted by: bde Reviewed by: mckusick Tested by: pho MFC after: 2 weeks
* Remove not needed ARGSUSED lint command.kib2012-03-111-1/+0
| | | | | Submitted by: bde MFC after: 3 days
* Remove fifo.h. The only used function declaration from the header iskib2012-03-111-2/+0
| | | | | | migrated to sys/vnode.h. Submitted by: gianni
* Revert r232692 as the correct place to fix this is at the syscall level.pho2012-03-091-2/+2
|
* Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag whichkib2012-03-091-2/+1
| | | | | | allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
* Add KTR_VFS traces to track modifications to a vnode's writecount.jhb2012-03-081-0/+2
|
* syscall() fuzzing can trigger this panic. Return EINVAL instead.pho2012-03-081-2/+2
| | | | MFC after: 1 week
* Similar to the fixes in 226967 and 226987, purge any name cache entriesjhb2012-03-021-0/+7
| | | | | | | | | associated with the previous vnode (if any) associated with the target of a rename(). Otherwise, a lookup of the target pathname concurrent with a rename() could re-add a name cache entry after the namei(RENAME) lookup in kern_renameat() had purged the target pathname. MFC after: 2 weeks
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-016-75/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Properly lock DQREF() with dqhlock. Missed locking caused counterkib2012-02-221-0/+4
| | | | | | | | | corruption. Assert that the dq reference value is sane before decrementing it. Reported and tested by: pho MFC after: 1 week
* Fix found places where uio_resid is truncated to int.kib2012-02-212-5/+10
| | | | | | | | | Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
* Missing conditions in checking whether an inode has been written.mckusick2012-02-131-0/+3
| | | | | Found and tested by: Peter Holm MFC after: 2 weeks (to 9 only)
* Historically when an application wrote an entire block of a file,mckusick2012-02-091-9/+20
| | | | | | | | | | | | | | | | | the kernel allocated a buffer but did not zero it as it was about to be completely filled by a uiomove() from the user's buffer. However, if the uiomove() failed, the old contents of the buffer could be exposed especially if the file was being mmap'ed. The fix was to always zero the buffer when it was allocated. This change first attempts the uiomove() to the newly allocated (and dirty) buffer and only zeros it if the uiomove() fails. The effect is to eliminate the gratuitous zeroing of the buffer in the usual case where the uiomove() successfully fills it. Reviewed by: kib Tested by: scottl MFC after: 2 weeks (to 9 only)
* In the original days of BSD, a sync was issued on every filesystemmckusick2012-02-071-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks
* Sprinkle missed calls to asynchronous UFS_UPDATE() in attempt tokib2012-02-072-4/+16
| | | | | | | | | | | guarantee that all UFS inode metadata changes results in the dirtiness of the inodeblock. Due to missed inodeblock updates, syncer was required to fsync each mount point' vnode to guarantee periodic metadata flush. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* Add missing opt_quota.h include to activate #ifdef QUOTA blocks,kib2012-02-061-1/+2
| | | | | | | apparently a step in unbreaking QUOTA support. Reported and tested by: Adam Strohl <adams-freebsd ateamsystems com> MFC after: 1 week
* JNEWBLK dependency may legitimately appear on the buf dependencykib2012-02-061-0/+1
| | | | | | | | | list. If softdep_sync_buf() discovers such dependency, it should do nothing, which is safe as it is only waiting on the parent buffer to be written, so it can be removed. Committed on behalf of: jeff MFC after: 1 week
* Current implementations of sync(2) and syncer vnode fsync() VOP useskib2012-02-061-1/+0
| | | | | | | | | | | | | | | | | | | | | | mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* There are several bugs/hangs when trying to take a snapshot on a UFS/FFSmckusick2012-01-171-1/+9
| | | | | | | | filesystem running with journaled soft updates. Until these problems have been tracked down, return ENOTSUPP when an attempt is made to take a snapshot on a filesystem running with journaled soft updates. MFC after: 2 weeks
* Make sure all intermediate variables holding mount flags (mnt_flag)mckusick2012-01-171-2/+2
| | | | | | | and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
* Add a bit of verbosity to the comment.ivoras2012-01-161-1/+6
|
* Convert FFS mount error messages from kernel printf's to using themckusick2012-01-141-61/+65
| | | | | | | | | | | vfs_mount_error error message facility provided by the nmount interface. Clean up formatting of mount warnings which still need to use kernel printf's since they do not return errors. Requested by: Craig Rodrigues <rodrigc@crodrigues.org> MFC after: 2 weeks
* Avoid LOR between vfs_busy() lock and covered vnode lock on quotaon().kib2012-01-081-3/+18
| | | | | | | | | | | | | The vfs_busy() is after covered vnode lock in the global lock order, but since quotaon() does recursive VFS call to open quota file, we usually end up locking covered vnode after mp is busied in sys_quotactl(). Change the interface of VFS_QUOTACTL(), requiring that mp was unbusied by fs code, and do not try to pick up vfs_busy() reference in ufs quotaon, esp. if vfs_busy cannot succeed due to unmount being performed. Reported and tested by: pho MFC after: 1 week
* Migrate ufs and ext2fs from skpc() to memcchr().ed2012-01-011-13/+7
| | | | | | | | While there, remove a useless check from the code. memcchr() always returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff can never be equal to zero. Also, the fact that memcchr() returns a pointer instead of the number of bytes until the end, makes conversion to an offset far more easy.
* Use implementation independent inoNN_t scalars for on-disk UFS structuresgleb2011-11-092-11/+11
| | | | Approved by: mdf (mentor)
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-4/+5
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Remove MALLOC_DECLAREs of nonexisting malloc-pools.ed2011-11-061-4/+0
| | | | | After careful grepping, it seems none of these pools can be found in our source tree. They are not in use, nor are they defined.
* Fix the wrong commit log message for r226967: "Added missing cache purgepho2011-10-311-0/+2
| | | | of from argument" and fix the comment.
* The kern_renameat() looks up the fvp using the DELETE flag, which causespho2011-10-311-0/+7
| | | | | | | | the removal of the name cache entry for fvp. Reported by: Anton Yuzhaninov <citrin citrin ru> In collaboration with: kib MFC after: 1 week
* This update eliminates a lock-order reversal warning discoveredmckusick2011-09-271-21/+24
| | | | | | | | | | | whle tracking down the system hang reported in kern/160662 and corrected in revision 225806. The LOR is not the cause of the system hang and indeed cannot cause an actual deadlock. However, it can be easily eliminated by defering the acquisition of a buflock until after all the vnode locks have been acquired. Reported by: Hans Ottevanger PR: kern/160662
* This update eliminates the system hang reported in kern/160662 whenmckusick2011-09-271-1/+1
| | | | | | | | taking a snapshot on a filesystem running with journaled soft updates. Reported by: Hans Ottevanger Fix verified by: Hans Ottevanger PR: kern/160662
* Use nowait sync request for a vnode when doing softdep cleanup. We possiblykib2011-09-201-1/+1
| | | | | | | own the unrelated vnode lock, doing waiting sync causes deadlocks. Reported and tested by: pho Approved by: re (bz)
* Generalize ffs_pages_remove() into vn_pages_remove().mm2011-08-253-16/+3
| | | | | | | | | | | Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
* Fix lock leak.ae2011-08-231-2/+2
| | | | | | Reported by: Alex Lyashkov Approved by: re (kib) MFC after: 1 week
* Fix two cases involving opt_capsicum.h and module builds:rwatson2011-08-151-1/+0
| | | | | | | | | | | | | | (1) opt_capsicum.h is no longer required in ffs_alloc.c, so remove the #include. (2) portalfs depends on opt_capsicum.h, so have the Makefile generate one if required. These affect only modules built without a kernel (i.e, not buildkernel, but yes buildworld if the dubious MODULES_WITH_WORLD is used). Approved by: re (bz) Sponsored by: Google Inc
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEPmckusick2011-07-305-35/+34
| | | | | | | is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
* Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flagmckusick2011-07-243-20/+23
| | | | | | | | | so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)
* Default debugging error messages to off for journaled soft updates sysctls.mckusick2011-07-221-5/+3
| | | | | | Delete limiting on output of these sysctls. Approved by: re (kib)
OpenPOWER on IntegriCloud