summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_fault.c
Commit message (Collapse)AuthorAgeFilesLines
* Use vm_page_prev() instead of vm_page_lookup() in the implementation ofalc2010-07-021-10/+12
| | | | | vm_fault()'s automatic delete-behind heuristic. vm_page_prev() is typically faster.
* When waiting for the busy page, do not unlock the object unless unlockkib2010-05-201-3/+6
| | | | | | | cannot be avoided. Reviewed by: alc MFC after: 1 week
* Push down the acquisition of the page queues lock into vm_pageq_remove().alc2010-05-091-2/+0
| | | | | | (This eliminates a surprising number of page queues lock acquisitions by vm_fault() because the page's queue is PQ_NONE and thus the page queues lock is not needed to remove the page from a queue.)
* Minimize the scope of the page queues lock in vm_fault().alc2010-05-081-1/+2
|
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-12/+3
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the page queues lock into vm_page_activate().alc2010-05-071-6/+1
|
* Push down the page queues lock into vm_page_deactivate(). Eliminate analc2010-05-071-2/+0
| | | | incorrect comment.
* Eliminate page queues locking around most calls to vm_page_free().alc2010-05-061-11/+1
|
* Acquire the page lock around all remaining calls to vm_page_free() onalc2010-05-051-2/+0
| | | | | | | | | | | | | managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
* Push down the acquisition of the page queues lock into vm_page_unwire().alc2010-05-051-9/+5
| | | | | | | Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib
* Add page locking to the vm_page_cow* functions.alc2010-05-041-6/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* Simplify vm_fault(). The introduction of the new page lock renders a bit ofalc2010-05-021-13/+5
| | | | | | | cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the page queues lock pointless. Reviewed by: kib, kmacy
* It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),alc2010-05-021-0/+6
| | | | | | | | | | to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.
* Unlock page lock instead of recursively locking it.kib2010-04-301-3/+3
|
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-4/+58
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Setting PG_REFERENCED on a page at the end of vm_fault() is redundant sincealc2010-04-281-1/+0
| | | | | | the page table entry's accessed bit is either preset by the immediately preceding call to pmap_enter() or by hardware (or software) upon return from vm_fault() when the faulting access is restarted.
* When OOM searches for a process to kill, ignore the processes alreadykib2010-04-061-6/+15
| | | | | | | | | | | | | | | killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
* Properly synchronize the previous change.alc2009-11-281-0/+2
|
* Support the new VM_PROT_COPY option on wired pages. The effect of whichalc2009-11-271-3/+6
| | | | | is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.
* Simplify the invocation of vm_fault(). Specifically, eliminate the flagalc2009-11-271-8/+11
| | | | | | | VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
* Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE hasalc2009-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks
* Simplify both the invocation and the implementation of vm_fault() for wiringalc2009-11-181-32/+13
| | | | | | | | | | pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib
* Eliminate an unnecessary #include. (This #include should have been removedalc2009-11-041-1/+0
| | | | in r188331 when vnode_pager_lock() was eliminated.)
* Eliminate a bit of hackery from vm_fault(). The operations that thisalc2009-11-031-11/+0
| | | | | | | hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.) Reviewed by: kib
* Correct an error in vm_fault_copy_entry() that has existed since the firstalc2009-10-311-1/+1
| | | | | | | | | | | version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions. Reviewed by: kib
* When protection of wired read-only mapping is changed to read-write,kib2009-10-271-16/+46
| | | | | | | | | | | | | | install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry. Reviewed by: alc MFC after: 3 weeks
* Simplify the inner loop of vm_fault_copy_entry().alc2009-10-261-13/+12
| | | | Reviewed by: kib
* Eliminate an unnecessary check from vm_fault_prefault().alc2009-10-251-2/+2
|
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-241-1/+2
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* When forking a vm space that has wired map entries, do not forget tokib2009-07-031-10/+11
| | | | | | | | | charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Eliminate unnecessary obfuscation when testing a page's valid bits.alc2009-06-071-4/+3
|
* Eliminate an incorrect comment.alc2009-05-071-2/+0
|
* Eliminate an archaic band-aid. The immediately preceding comment alreadyalc2009-04-261-5/+3
| | | | | | explains why the band-aid is unnecessary. Suggested by: tegge
* Allow valid pages to be mapped for read access when they have a non-zeroalc2009-04-191-1/+0
| | | | | | | | | | | busy count. Only mappings that allow write access should be prevented by a non-zero busy count. (The prohibition on mapping pages for read access when they have a non- zero busy count originated in revision 1.202 of i386/i386/pmap.c when this code was a part of the pmap.) Reviewed by: tegge
* Prior to r188331 a map entry's last read offset was only updated by a hardalc2009-02-251-3/+7
| | | | | | fault. In r188331 this update was relocated because of synchronization changes to a place where it would occur on both hard and soft faults. This change again restricts the update to hard faults.
* Avoid some cases of unnecessary page queues locking by vm_fault's delete-alc2009-02-091-5/+11
| | | | behind heuristic.
* Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by aalc2009-02-081-3/+0
| | | | | | redundant assertion in vm_fault(). Reviewed by: kib
* Remove no longer valid comment.kib2009-02-081-3/+0
| | | | Submitted by: alc
* Do not sleep for vnode lock while holding map lock in vm_fault. Try tokib2009-02-081-43/+80
| | | | | | | | | | | | | | | acquire vnode lock for OBJT_VNODE object after map lock is dropped. Because we have the busy page(s) in the object, sleeping there would result in deadlock with vnode resize. Try to get lock without sleeping, and, if the attempt failed, drop the state, lock the vnode, and restart the fault handler from the start with already locked vnode. Because the vnode_pager_lock() function is inlined in vm_fault(), axe it. Based on suggestion by: alc Reviewed by: tegge, alc Tested by: pho
* Style.kib2009-02-081-0/+2
|
* Simplify the inner loop of vm_fault()'s delete-behind heuristic.alc2008-03-161-2/+2
| | | | | | Instead of checking each page for PG_UNMANAGED, perform a one-time check whether the object is OBJT_PHYS. (PG_UNMANAGED pages only belong to OBJT_PHYS objects.)
* Eliminate an unnecessary test from vm_fault's delete-behind heuristic.alc2008-03-091-1/+1
| | | | | | | Specifically, since the delete-behind heuristic is never applied to a device-backed object, there is no point in checking whether each of the object's pages is fictitious. (Only device-backed objects have fictitious pages.)
* Add an access type parameter to pmap_enter(). It will be used to implementalc2008-01-031-3/+4
| | | | | | | superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
* Add the superpage reservation system. This is "part 2 of 2" of thealc2007-12-291-0/+13
| | | | | | | | | | | | machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)
* Do not dereference NULL pointer.kib2007-10-081-3/+2
| | | | | | Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)
* Change the management of cached pages (PQ_CACHE) in two fundamentalalc2007-09-251-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
* Two changes to vm_fault_additional_pages():alc2007-07-201-19/+11
| | | | | | | | | | | | | | | | 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks
* Eliminate the special case handling of OBJT_DEVICE objects inalc2007-07-081-10/+0
| | | | | | | | | | vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)
* When a cached page is reactivated in vm_fault(), update the counter thatalc2007-07-061-8/+10
| | | | | | | | | | tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week
OpenPOWER on IntegriCloud