summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_bio.c
Commit message (Collapse)AuthorAgeFilesLines
* The buffers b_vflags field is not always properly protected bykib2010-08-121-4/+49
| | | | | | | | | | | | | | | bufobj lock. If b_bufobj is not NULL, then bufobj lock should be held when manipulating the flags. Not doing this sometimes leaves BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to stuck indefinitely in "getbuf" sleep, waiting for background write to finish which is not actually performed. Add BO_LOCK() in the cases where it was missed. In collaboration with: pho Tested by: bz Reviewed by: jeff MFC after: 1 month
* Fix (hopefully) the spelling of "queuing."ivoras2010-08-091-1/+1
| | | | Submitted by: bf1783 at gmail com
* Elaborate on how hirunningspace was chosen.ivoras2010-08-091-2/+5
|
* Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that createdkib2010-08-061-2/+3
| | | | | | | | | cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
* Make lorunningspace catch up with hirunningspace.ivoras2010-07-231-1/+6
| | | | | | While there, add comment about the magic numbers. Prodded by: alc
* Fix expression style.ivoras2010-07-201-3/+2
| | | | Prodded by: jhb
* In keeping with the Age-of-the-fruitbat theme, scale up hirunningspace onivoras2010-07-181-1/+3
| | | | | | | | | | machines which can clearly afford the memory. This is a somewhat conservative version of the patch - more fine tuning may be necessary. Idea from: Thread on hackers@ Discussed with: alc
* Change the implementation of vm_hold_free_pages() so that it performs atalc2010-07-111-26/+16
| | | | | | | | | most one call to pmap_qremove(), and thus one TLB shootdown, instead of one call and TLB shootdown per page. Simplify the interface to vm_hold_free_pages(). MFC after: 3 weeks
* Add support for the VM_ALLOC_COUNT() hint to vm_page_alloc(). Consequently,alc2010-07-091-3/+2
| | | | | | | | | | | | the maintenance of vm_pageout_deficit can be localized to just two places: vm_page_alloc() and vm_pageout_scan(). This change also corrects an off-by-one error in the maintenance of vm_pageout_deficit. Historically, the buffer cache functions, allocbuf() and vm_hold_load_pages(), have not taken into account that vm_page_alloc() already increments vm_pageout_deficit by one. Reviewed by: kib
* Add the ability for the allocflag argument of the vm_page_grab() tokib2010-07-051-50/+11
| | | | | | | | | specify the increment of vm_pageout_deficit when sleeping due to page shortage. Then, in allocbuf(), the code to allocate pages when extending vmio buffer can be replaced by a call to vm_page_grab(). Suggested and reviewed by: alc MFC after: 2 weeks
* Improve bufdone_finish()'s handling of the bogus page. Specifically, ifalc2010-06-301-4/+6
| | | | | | | | one or more mappings to the bogus page must be replaced, call pmap_qenter() just once. Previously, pmap_qenter() was called for each mapping to the bogus page. MFC after: 3 weeks
* Add INVARIANTS checking that numfreebufs values are sane. Also add amdf2010-06-111-10/+54
| | | | | | | | | per-buf flag to catch if a buf is double-counted in the free count. This code was useful to debug an instance where a local patch at Isilon was incorrectly managing numfreebufs for a new buf state. Reviewed by: jeff Approved by: zml (mentor)
* Reorganize the code in bdwrite() which handles move of dirtinesskib2010-06-081-70/+65
| | | | | | | | | | | | | | | | | | | | from the buffer pages to buffer. Combine the code to set buffer dirty range (previously in vfs_setdirty()) and to clean the pages (vfs_clean_pages()) into new function vfs_clean_pages_dirty_buf(). Now the vm object lock is acquired only once. Drain the VPO_BUSY bit of the buffer pages before setting valid and clean bits in vfs_clean_pages_dirty_buf() with new helper vfs_drain_busy_pages(). pmap_clear_modify() asserts that page is not busy. In vfs_busy_pages(), move the wait for draining of VPO_BUSY before the dirtyness handling, to follow the structure of vfs_clean_pages_dirty_buf(). Reported and tested by: pho Suggested and reviewed by: alc MFC after: 2 weeks
* Minimize the use of the page queues lock for synchronizing access to thealc2010-06-021-2/+0
| | | | | page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.
* Eliminate the acquisition and release of the page queues lock fromalc2010-05-251-5/+0
| | | | | | vfs_busy_pages(). It is no longer needed. Submitted by: kib
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* The page queues lock is no longer required by vm_page_set_invalid(), soalc2010-05-181-2/+0
| | | | | | | | | eliminate it. Assert that the object containing the page is locked in vm_page_test_dirty(). Perform some style clean up while I'm here. Reviewed by: kib
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-2/+0
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the acquisition of the page queues lock into vm_page_unwire().alc2010-05-051-4/+2
| | | | | | | Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib
* Add page locking to the vm_page_cow* functions.alc2010-05-041-2/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* Acquire the page lock around vm_page_unwire() and vm_page_wire().alc2010-05-031-4/+9
| | | | Reviewed by: kib
* Properly synchronize access to the page's hold_count in vfs_vmio_release().alc2010-05-021-6/+6
| | | | Reviewed by: kib
* It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),alc2010-05-021-1/+10
| | | | | | | | | | to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-6/+7
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* - Merge soft-updates journaling from projects/suj/head into head. Thisjeff2010-04-241-3/+25
| | | | | | | | brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
* bo_bsize: revert r205860 and take an alternative approch in getblkavg2010-04-021-1/+1
| | | | | | | | | | | | | | | | | In r205860 I missed the fact that there is code that strongly assumes that devvp bo_bsize is equal to underlying provider's sectorsize. In those places it is hard to obtain the sectorsize in an alternative way if devvp bo_bsize is set to something else. So, I am reverting bo_bsize assigment in g_vfs_open. Instead, in getblk I use DEV_BSIZE block size for b_offset calculation if vp is a disk vp as reported by vn_isdisk. This should coinside with vp being a devvp. Reported by: Mykola Dzham <i@levsha.me> Tested by: Mykola Dzham <i@levsha.me> Pointyhat to: avg MFC after: 2 weeks X-ToDo: convert bread(devvp) in all fs to use bo_bsize-d blocks
* When buffer write is failed, it is wrong for brelse() to invalidatekib2009-07-191-1/+2
| | | | | | | | | | portion of the page that was written. Among other problems, this page might be picked up by pagedaemon, with failed assertion in vm_pageout_flush() about validity of the page. Reported and tested by: pho Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate an unused variable from allocbuf().alc2009-06-071-3/+0
| | | | | Eliminate the unnecessary setting of page valid bits from a non-VMIO buffer in vm_hold_load_pages().
* Eliminate a comment describing code that was deleted over eight years ago.alc2009-06-011-14/+6
| | | | Move another comment to its proper place. Fix a typo in a third comment.
* nfs_write() can use the recently introduced vfs_bio_set_valid() instead ofalc2009-05-311-41/+0
| | | | | | vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
* Modify vm_hold_load_pages() to allocate pages using VM_ALLOC_NOOBJ ratheralc2009-05-291-13/+5
| | | | | than using the kernel object. This allows the elimination of page queues locking from vm_hold_free_pages().
* fail(9) support:zml2009-05-271-3/+13
| | | | | | | | Add support for kernel fault injection using KFAIL_POINT_* macros and fail_point_* infrastructure. Add example fail point in vfs_bio.c to simulate VM buf pressure. Approved by: dfr (mentor)
* Only use the ABI compat shim for vfs.bufspace if the old buffer is smallerjhb2009-05-211-1/+1
| | | | | | | | than a long. PR: amd64/134786 Submitted by: Emil Mikulic emikulic| gmail MFC after: 3 days
* Several changes to vfs_bio_clrbuf():alc2009-05-171-13/+11
| | | | | | | | | | Provide a more descriptive comment. Eliminate dead code. The page cannot possibly have PG_ZERO set. Eliminate unnecessary blank lines. Reviewed by: tegge
* Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). Thisalc2009-05-171-0/+38
| | | | | | eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge
* Eliminate page queues locking from bufdone_finish() through thealc2009-05-131-11/+36
| | | | | | | | | | | | | | | | | | | | following changes: Rename vfs_page_set_valid() to vfs_page_set_validclean() to reflect what this function actually does. Suggested by: tegge Introduce a new version of vfs_page_set_valid() that does no more than what the function's name implies. Specifically, it does not update the page's dirty mask, and thus it does not require the page queues lock to be held. Update two of the three callers to the old vfs_page_set_valid() to call vfs_page_set_validclean() instead because they actually require the page's dirty mask to be cleared. Introduce vm_page_set_valid(). Reviewed by: tegge
* Revert CVS revision 1.94 (svn r16840). Current pmap implementations don'talc2009-05-111-5/+7
| | | | | | | | | | suffer from the race condition that motivated revision 1.94. Consequently, the work-around that was implemented by revision 1.94 is no longer needed. Moreover, reverting this work-around eliminates the need for vfs_busy_pages() to acquire the page queues lock when preparing a buffer for read. Reviewed by: tegge
* Undo private changes that should never have been committed.kan2009-04-171-70/+0
|
* More fallout from negative dotdot caching. Negative entries shouldkan2009-04-171-0/+70
| | | | | | | be removed from and reinserted to proper ncneg list. Reported by: pho Submitted by: kib
* In flushbufqueues(), do not allocate sentinel buffer on the stack,kib2009-04-161-7/+9
| | | | | struct buf is large. Use sleeping malloc(9) call, and zero the allocated buf as a debugging feature.
* Export the number of times bufdaemon got help from the normal threads.kib2009-04-161-0/+4
|
* Improve the description of a few sysctls.jhb2009-03-231-2/+2
| | | | | Submitted by: bde (partially) MFC after: 3 days
* Fix an old-standing bug that crept in along the several revisions:attilio2009-03-171-15/+15
| | | | | | | | | | B_DELWRI cleanup and vnode disassociation should happen just before to assign the buffer to a queue. Reported by: miwi, Volker <volker at vwsoft dot com>, Ben Kaduk <minimarmot at gmail dot com>, Christopher Mallon <christoph dot mallon at gmx dot de> Tested by: lulf, miwi
* Fix two issues with bufdaemon, often causing the processes to hang inkib2009-03-161-45/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the "nbufkv" sleep. First, ffs background cg group block write requests a new buffer for the shadow copy. When ffs_bufwrite() is called from the bufdaemon due to buffers shortage, requesting the buffer deadlock bufdaemon. Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk to not block while allocating the buffer, and return failure instead. Add a flag argument to the geteblk to allow to pass the flags to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer allocation failed and either GB_NOWAIT_BD is specified, or geteblk() is called from bufdaemon (or its helper, see below). In ffs_bufwrite(), fall back to synchronous cg block write if shadow block allocation failed. Since r107847, buffer write assumes that vnode owning the buffer is locked. The second problem is that buffer cache may accumulate many buffers belonging to limited number of vnodes. With such workload, quite often threads that own the mentioned vnodes locks are trying to read another block from the vnodes, and, due to buffer cache exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make any substantial progress because the vnodes are locked. Allow the threads owning vnode locks to help the bufdaemon by doing the flush pass over the buffer cache before getnewbuf() is going to uninterruptible sleep. Move the flushing code from buf_daemon() to new helper function buf_do_flush(), that is called from getnewbuf(). The number of buffers flushed by single call to buf_do_flush() from getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent recursive calls to buf_do_flush() by marking the bufdaemon and threads that temporarily help bufdaemon by TDP_BUFNEED flag. In collaboration with: pho Reviewed by: tegge (previous version) Tested by: glebius, yandex ... MFC after: 3 weeks
* In the ABI shim for vfs.bufspace, rather than truncating values larger thanjhb2009-03-101-1/+4
| | | | | | | INT_MAX to INT_MAX, just go ahead and write out the full long to give an error of ENOMEM to the user process. Requested by: bde
* Add an ABI compat shim for the vfs.bufspace sysctl for sysctl requests thatjhb2009-03-101-0/+27
| | | | | try to fetch it as an int rather than a long. If the current value is greater than INT_MAX it reports a value of INT_MAX.
* Adjust some variables (mostly related to the buffer cache) that holdjhb2009-03-091-33/+40
| | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
* Tweak the output of VOP_PRINT/vn_printf() some.jhb2009-02-061-0/+1
| | | | | | | | - Align the fifo output in fifo_print() with other vn_printf() output. - Remove the leading space from lockmgr_printinfo() so its output lines up in vn_printf(). - lockmgr_printinfo() now ends with a newline, so remove an extra newline from vn_printf().
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-101-2/+2
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Add the ffs structures introspection functions for ddb.kib2008-09-161-2/+25
| | | | | | | | | Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month
OpenPOWER on IntegriCloud