summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_bio.c
Commit message (Collapse)AuthorAgeFilesLines
* MFCattilio2013-02-271-6/+13
|
* Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-5/+5
| | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-23/+23
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-5/+6
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Add barrier write capability to the VFS buffer interface. A barriermckusick2013-02-161-0/+42
| | | | | | | | | | | | | | | | | | | write is a disk write request that tells the disk that the buffer being written must be committed to the media along with any writes that preceeded it before any future blocks may be written to the drive. Barrier writes are provided by adding the functions bbarrierwrite (bwrite with barrier) and babarrierwrite (bawrite with barrier). Following a bbarrierwrite the client knows that the requested buffer is on the media. It does not ensure that buffers written before that buffer are on the media. It only ensure that buffers written before that buffer will get to the media before any buffers written after that buffer. A flush command must be sent to the disk to ensure that all earlier written buffers are on the media. Reviewed by: kib Tested by: Peter Holm
* Fixup r218424: uio_yield() was scaling directly to userland priority.attilio2012-12-211-1/+1
| | | | | | | | | | | | | | | When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week
* Do not ignore zero address, possibly returned by the vm_map_find()kib2012-12-101-14/+14
| | | | | | | | | | | | call. The function indicates a failure by the TRUE return value. To be extra safe, assert that the return value from the following vm_map_insert() indicates success. Fix style issues in the nearby lines, reformulate the comment. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Remove useless comment.kib2012-12-091-2/+0
| | | | MFC after: 3 days
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-24/+5
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix typo [1]. Use commas to separate flag printouts, in style withkib2012-06-021-1/+1
| | | | | | | other parts of function. Submitted by: bf [1] MFC after: 1 week
* Update the print mask for decoding b_flags. Add print masks forkib2012-06-021-1/+3
| | | | | | b_vflags and b_xflags_t and print them as well. MFC after: 1 week
* Do not call bremfree for managed buffers.gber2012-05-151-5/+9
| | | | | | | Calling bremfree for these buffers results in panic: "bremfree: buffer %p not on a queue." Approved by: kib
* This change avoids a kernel deadlock on "snaplk" when usingmckusick2012-03-011-21/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates. The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot. The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot. A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot. Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
* Fix typo.alc2012-02-261-1/+1
| | | | MFC after: 1 week
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-301-2/+2
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
* Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls toalc2011-10-271-4/+3
| | | | | | vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
* Improve the informations reported in case of busy buffers during the shutdown:attilio2011-09-081-1/+1
| | | | | | | | | | | | | | | | | | | - Axe out the SHOW_BUSYBUFS option and uses a tunable for selectively enable/disable it, which is defaulted for not printing anything (0 value) but can be changed for printing (1 value) and be verbose (2 value) - Improves the informations outputed: right now, there is no track of the actual struct buf object or vnode which are referenced by the shutdown process, but it is printed the related struct bufobj object which is not really helpful - Add more verbosity about the state of the struct buf lock and the vnode informations, with the latter to be activated separately by the sysctl Sponsored by: Sandvine Incorporated Reviewed by: emaste, kib Approved by: re (ksmith) MFC after: 10 days
* Call pmap_qremove() before freeing or unwiring the pages, otherwisemarius2011-07-051-3/+5
| | | | | | | | there's a window during which a page can be re-used before its previous mapping is removed. Reviewed by: alc MFC after: 1 week
* - When printing bufs with show buf the lblkno is often more useful thanjeff2011-06-101-2/+3
| | | | the blkno. Print them both.
* BKVASIZE was bumped to 16k more than a decade ago.ru2011-05-231-1/+1
|
* Use a name instead of a magic number for kern_yield(9) when the prioritymdf2011-05-131-1/+1
| | | | | | | | should not change. Fetch the td_user_pri under the thread lock. This is probably not necessary but a magic number also seems preferable to knowing the implementation details here. Requested by: Jason Behmer < jason DOT behmer AT isilon DOT com >
* Retire VFS_BIO_DEBUG. Convert those checks that were still valid intoalc2011-02-121-57/+14
| | | | | | | | | KASSERT()s and eliminate the rest. Replace excessive printf()s and a panic() in bufdone_finish() with a KASSERT() in vm_page_io_finish(). Reviewed by: kib
* Based on discussions on the svn-src mailing list, rework r218195:mdf2011-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | - entirely eliminate some calls to uio_yeild() as being unnecessary, such as in a sysctl handler. - move should_yield() and maybe_yield() to kern_synch.c and move the prototypes from sys/uio.h to sys/proc.h - add a slightly more generic kern_yield() that can replace the functionality of uio_yield(). - replace source uses of uio_yield() with the functional equivalent, or in some cases do not change the thread priority when switching. - fix a logic inversion bug in vlrureclaim(), pointed out by bde@. - instead of using the per-cpu last switched ticks, use a per thread variable for should_yield(). With PREEMPTION, the only reasonable use of this is to determine if a lock has been held a long time and relinquish it. Without PREEMPTION, this is essentially the same as the per-cpu variable.
* Eliminate unnecessary page hold_count checks. These checks predatealc2011-02-031-2/+1
| | | | | | | r90944, which introduced a general mechanism for handling the freeing of held pages. Reviewed by: kib@
* Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the onlykib2010-12-291-1/+1
| | | | | | | | | | | | | consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY was cleared early in vm_object_page_clean, before the cleaning pass was done. This is no longer true after r216799. Moreover, since OBJ_CLEANING is a flag, and not the counter, it could be reset too prematurely when parallel vm_object_page_clean() are performed. Reviewed by: alc (as a part of the bigger patch) MFC after: 1 month (after r216799 is merged)
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-251-33/+6
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Implement and use a single optimized function for unholding a set of pages.alc2010-12-171-6/+1
| | | | Reviewed by: kib@
* Reduce the difference between hirunningspace and lorunningspace,ivoras2010-10-251-3/+3
| | | | it should help interactivity in edge cases.
* The buffers b_vflags field is not always properly protected bykib2010-08-121-4/+49
| | | | | | | | | | | | | | | bufobj lock. If b_bufobj is not NULL, then bufobj lock should be held when manipulating the flags. Not doing this sometimes leaves BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to stuck indefinitely in "getbuf" sleep, waiting for background write to finish which is not actually performed. Add BO_LOCK() in the cases where it was missed. In collaboration with: pho Tested by: bz Reviewed by: jeff MFC after: 1 month
* Fix (hopefully) the spelling of "queuing."ivoras2010-08-091-1/+1
| | | | Submitted by: bf1783 at gmail com
* Elaborate on how hirunningspace was chosen.ivoras2010-08-091-2/+5
|
* Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that createdkib2010-08-061-2/+3
| | | | | | | | | cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
* Make lorunningspace catch up with hirunningspace.ivoras2010-07-231-1/+6
| | | | | | While there, add comment about the magic numbers. Prodded by: alc
* Fix expression style.ivoras2010-07-201-3/+2
| | | | Prodded by: jhb
* In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace onivoras2010-07-181-1/+3
| | | | | | | | | | machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc
* Change the implementation of vm_hold_free_pages() so that it performs atalc2010-07-111-26/+16
| | | | | | | | | most one call to pmap_qremove(), and thus one TLB shootdown, instead of one call and TLB shootdown per page. Simplify the interface to vm_hold_free_pages(). MFC after: 3 weeks
* Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently,alc2010-07-091-3/+2
| | | | | | | | | | | | the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib
* Add the ability for the allocflag argument of the vm_page_grab() tokib2010-07-051-50/+11
| | | | | | | | | specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks
* Improve bufdone_finish()'s handling of the bogus page. Specifically, ifalc2010-06-301-4/+6
| | | | | | | | one or more mappings to the bogus page must be replaced, call pmap_qenter() just once. Previously, pmap_qenter() was called for each mapping to the bogus page. MFC after: 3 weeks
* Add INVARIANTS checking that numfreebufs values are sane. Also add amdf2010-06-111-10/+54
| | | | | | | | | per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)
* Reorganize the code in bdwrite() which handles move of dirtinesskib2010-06-081-70/+65
| | | | | | | | | | | | | | | | | | | | from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks
* Minimize the use of the page queues lock for synchronizing access to thealc2010-06-021-2/+0
| | | | | page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.
* Eliminate the acquisition and release of the page queues lock fromalc2010-05-251-5/+0
| | | | | | vfs_busy_pages(). It is no longer needed. Submitted by: kib
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* The page queues lock is no longer required by vm_page_set_invalid(), soalc2010-05-181-2/+0
| | | | | | | | | eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-2/+0
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the acquisition of the page queues lock into vm_page_unwire().alc2010-05-051-4/+2
| | | | | | | Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib
* Add page locking to the vm_page_cow* functions.alc2010-05-041-2/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* Acquire the page lock around vm_page_unwire() and vm_page_wire().alc2010-05-031-4/+9
| | | | Reviewed by: kib
* Properly synchronize access to the page's hold_count in vfs_vmio_release().alc2010-05-021-6/+6
| | | | Reviewed by: kib
OpenPOWER on IntegriCloud