summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_pageout.c
Commit message (Collapse)AuthorAgeFilesLines
* Push down the acquisition of the page queues lock intoalc2010-07-021-3/+2
| | | | | vm_pageout_page_stats(). In particular, avoid acquiring the page queues lock unless iterating over the active queue.
* With the demise of page coloring, the page queue macros no longer serve anyalc2010-07-021-5/+4
| | | | | | useful purpose. Eliminate them. Reviewed by: kib
* Simplify entry to vm_pageout_clean(). Expect the page to be locked.alc2010-06-301-8/+4
| | | | | | | | | Previously, the caller unlocked the page, and vm_pageout_clean() immediately reacquired the page lock. Also, assert rather than test that the page is neither busy nor held. Since vm_pageout_clean() is called with the object and page locked, the page can't have changed state since the caller verified that the page is neither busy nor held.
* Introduce vm_page_next() and vm_page_prev(), and use them inalc2010-06-211-13/+8
| | | | | | | | | vm_pageout_clean(). When iterating over a range of pages, these functions can be cheaper than vm_page_lookup() because their implementation takes advantage of the vm_object's memq being ordered. Reviewed by: kib@ MFC after: 3 weeks
* Eliminate checks for a page having a NULL object in vm_pageout_scan()alc2010-06-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and vm_pageout_page_stats(). These checks were recently introduced by the first page locking commit, r207410, but they are not needed. At the same time, eliminate some redundant accesses to the page's object field. (These accesses should have neen eliminated by r207410.) Make the assertion in vm_page_flag_set() stricter. Specifically, only managed pages should have PG_WRITEABLE set. Add a comment documenting an assertion to vm_page_flag_clear(). It has long been the case that fictitious pages have their wire count permanently set to one. Add comments to vm_page_wire() and vm_page_unwire() documenting this. Add assertions to these functions as well. Update the comment describing vm_page_unwire(). Much of the old comment had little to do with vm_page_unwire(), but a lot to do with _vm_page_deactivate(). Move relevant parts of the old comment to _vm_page_deactivate(). Only pages that belong to an object can be paged out. Therefore, it is pointless for vm_page_unwire() to acquire the page queues lock and enqueue such pages in one of the paging queues. Generally speaking, such pages are immediately freed after the call to vm_page_unwire(). Previously, it was the call to vm_page_free() that reacquired the page queues lock and removed these pages from the paging queues. Now, we will never acquire the page queues lock for this case. (It is also worth noting that since both vm_page_unwire() and vm_page_free() occurred with the page locked, the page daemon never saw the page with its object field set to NULL.) Change the panic with vm_page_unwire() to provide a more precise message. Reviewed by: kib@
* Reduce the scope of the page queues lock and the number ofalc2010-06-101-38/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea*_page_exits_quick() and moea*_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea*_clear_bit(), simplifying moea*_clear_modify() and moea*_clear_reference(). The last parameter to moea*_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-241-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-10/+7
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Fix a typo in the previous commit.jkim2010-05-071-1/+1
|
* One more use for vm_pageout_init_marker().kib2010-05-071-8/+1
| | | | Reviewed by: alc
* Add a helper function vm_pageout_page_lock(), similar to tegge'kib2010-05-061-12/+60
| | | | | | | | | | | | | vm_pageout_fallback_object_lock(), to obtain the page lock while having page queue lock locked, and still maintain the page position in a queue. Use the helper to lock the page in the pageout daemon and contig launder iterators instead of skipping the page if its lock is contested. Skipping locked pages easily causes pagedaemon or launder to not make a progress with page cleaning. Proposed and reviewed by: alc
* Eliminate an assignment that was made redundant by r207410.alc2010-05-021-2/+0
|
* Defer the acquisition of the page and page queues locks inalc2010-05-021-8/+8
| | | | vm_pageout_object_deactivate_pages().
* push up dropping of the page queue lock to avoid holding it in vm_pageout_flushkmacy2010-04-301-0/+2
|
* - don't check hold_count without the page lock heldkmacy2010-04-301-6/+7
| | | | | - don't leak the page lock if m->object is NULL (assuming that that check will in fact even be valid when m->object is protected by the page lock)
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-11/+75
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Simplify the inner loop of vm_pageout_object_deactivate_pages(). Ratheralc2010-04-291-7/+4
| | | | | | | | than checking each page for PG_UNMANAGED, check the vm object's type. Only OBJT_PHYS can have unmanaged pages. Eliminate a pointless counter. The vm object is locked, that lock is never released by the inner loop, and the set of pages contained by the vm object is not changed by the inner loop. Therefore, the counter serves no purpose.
* Eliminate an unnecessary call to pmap_remove_all(). If a page belongs toalc2010-04-201-1/+2
| | | | | an object whose reference count is zero, then that page cannot possibly be mapped.
* Remove a nonsensical test from vm_pageout_clean(). A page can't be in thealc2010-04-181-2/+0
| | | | | | | inactive queue and have a non-zero wire count. Reviewed by: kib MFC after: 3 weeks
* When OOM searches for a process to kill, ignore the processes alreadykib2010-04-061-2/+2
| | | | | | | | | | | | | | | killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
* When a vnode-backed vm object is referenced, it increments the vnodekib2010-01-171-0/+2
| | | | | | | | | | | | | | | reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-241-1/+3
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* The bits set in a page's dirty mask are a subset of the bits set in itsalc2009-06-241-4/+2
| | | | | | | | valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid. Eliminate an unnecessary #include.
* Revise vm_pageout_scan()'s handling of partially dirty pages. Specifically,alc2009-05-281-8/+9
| | | | | | | | | | | rather than unconditionally making partially dirty pages fully dirty, only make partially dirty pages fully dirty if the pmap says that the page has been modified. (This change is also a small optimization. It eliminate an unnecessary call to pmap_is_modified() on pages that are mapped read only.) Suggested by: tegge
* Eliminate a pointless call to pmap_clear_reference() from vm_pageout_scan().alc2009-05-171-1/+2
| | | | | If the page belongs to an object with a reference count of zero, then it can't have any managed mappings on which to clear a reference bit.
* Use the acquired reference to the vmspace instead of direct dereferencingkib2009-04-281-1/+1
| | | | | | of p->p_vmspace in a place where it was missed in r191277. Noted by: pluknet gmail com
* In both pageout oom handler and vm_daemon, acquire the reference tokib2009-04-191-8/+21
| | | | | | | | | | the vmspace of the examined process instead of directly accessing its vmspace, that may change. Also, as an optimization, check for P_INEXEC flag before examining the process. Reported and tested by: pho (previous version) Reviewed by: alc MFC after: 3 week
* Calling pmap_clear_modify() after calling pmap_remove_write() is pointless.alc2009-04-191-1/+0
| | | | | The latter function already clears the modified status from each of the page's mappings.
* Instead of forcing vn_start_write() to reset mp back to NULL for thekib2008-11-161-2/+1
| | | | | | | | | failed calls with non-NULL vp, explicitely clear mp after failure. Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week
* Move the code for doing out-of-memory grass from vm_pageout_scan()kib2008-09-291-68/+77
| | | | | | | | | | | | into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call. Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted. Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks
* Prevent an integer overflow in vm_pageout_page_stats() on machines with aalc2008-09-211-1/+2
| | | | | | | | large number of physical pages. PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days
* Fill in a few sysctl descriptions.trhodes2008-08-031-2/+2
| | | | | Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
* Rename vm_pageq_requeue() to vm_page_requeue() on account of its recentalc2008-03-191-12/+12
| | | | migration to vm/vm_page.c.
* - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice fromjeff2008-03-191-6/+0
| | | | | | | requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-2/+3
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Make contigmalloc(9)'s page laundering more robust. Specifically, usealc2007-11-251-1/+1
| | | | | | | | | vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks
* Add a read/write sysctl for reconfiguring the maximum number of physicalalc2007-11-231-0/+2
| | | | | | | | pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks
* Change the management of cached pages (PQ_CACHE) in two fundamentalalc2007-09-251-35/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-1/+1
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* Correct an assertion in vm_pageout_flush(). Specifically, if a page'salc2007-09-151-1/+2
| | | | | | | | | | | status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week
* In the previous revision, when I replaced the unconditional acquisitionalc2007-07-021-11/+10
| | | | | | | | | | | | | | | | | | | | | | of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate the use of Giant from vm_daemon(). Replace the unconditionalalc2007-06-261-21/+27
| | | | | | | use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate unnecessary checks from vm_pageout_clean(): The page that isalc2007-06-181-7/+4
| | | | | | | | | | | | passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge
* Enable the new physical memory allocator.alc2007-06-161-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
* Eliminate dead code: We have not performed pageouts on the kernel objectalc2007-06-131-3/+1
| | | | in this millenium.
* Optimize vmmeter locking.attilio2007-06-101-11/+6
| | | | | | | | | | In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-7/+12
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-041-6/+11
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* Revert VMCNT_* operations introduction.attilio2007-05-311-47/+47
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-181-47/+47
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
OpenPOWER on IntegriCloud