summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_fault.c
Commit message (Collapse)AuthorAgeFilesLines
* Use the trick of performing the atomic operation on the contained alignedkib2011-09-281-10/+2
| | | | | | | | | | | word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-3/+1
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Add a facility to disable processing page faults. When activated,kib2011-07-091-0/+16
| | | | | | | | uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
* Revert to using the page queues lock in vm_page_clear_dirty_mask() onalc2011-06-231-2/+1
| | | | | MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)
* Precisely document the synchronization rules for the page's dirty field.alc2011-06-191-0/+10
| | | | | | | | | | | | | (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib
* Handle the corner case in vm_fault_quick_hold_pages().kib2011-03-251-0/+2
| | | | | | | | | | If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages(). Sponsored by: The FreeBSD Foundation Reviewed by: alc
* For some time now, the kernel and kmem objects have been ordinaryalc2011-01-151-4/+1
| | | | | | | | | OBJT_PHYS objects. Thus, there is no need for handling them specially in vm_fault(). In fact, this special case handling would have led to an assertion failure just before the call to pmap_enter(). Reviewed by: kib@ MFC after: 6 weeks
* Correct a typo in vm_fault_quick_hold_pages().alc2010-12-281-1/+1
| | | | Reported by: Bartosz Stec
* Retire vm_fault_quick(). It's no longer used.alc2010-12-251-18/+0
| | | | Reviewed by: kib@
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-251-0/+75
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Introduce vm_fault_hold() and use it to (1) eliminate a long-standing racealc2010-12-201-5/+18
| | | | | | | | | | condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks
* Replace pointer to "struct uidinfo" with pointer to "struct ucred"trasz2010-12-021-5/+5
| | | | | | | | | in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation
* Use vm_page_prev() instead of vm_page_lookup() in the implementation ofalc2010-07-021-10/+12
| | | | | vm_fault()'s automatic delete-behind heuristic. vm_page_prev() is typically faster.
* When waiting for the busy page, do not unlock the object unless unlockkib2010-05-201-3/+6
| | | | | | | cannot be avoided. Reviewed by: alc MFC after: 1 week
* Push down the acquisition of the page queues lock into vm_pageq_remove().alc2010-05-091-2/+0
| | | | | | (This eliminates a surprising number of page queues lock acquisitions by vm_fault() because the page's queue is PQ_NONE and thus the page queues lock is not needed to remove the page from a queue.)
* Minimize the scope of the page queues lock in vm_fault().alc2010-05-081-1/+2
|
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-12/+3
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the page queues lock into vm_page_activate().alc2010-05-071-6/+1
|
* Push down the page queues lock into vm_page_deactivate(). Eliminate analc2010-05-071-2/+0
| | | | incorrect comment.
* Eliminate page queues locking around most calls to vm_page_free().alc2010-05-061-11/+1
|
* Acquire the page lock around all remaining calls to vm_page_free() onalc2010-05-051-2/+0
| | | | | | | | | | | | | managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
* Push down the acquisition of the page queues lock into vm_page_unwire().alc2010-05-051-9/+5
| | | | | | | Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib
* Add page locking to the vm_page_cow* functions.alc2010-05-041-6/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* Simplify vm_fault(). The introduction of the new page lock renders a bit ofalc2010-05-021-13/+5
| | | | | | | cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the page queues lock pointless. Reviewed by: kib, kmacy
* It makes no sense for vm_page_sleep_if_busy()'s helper, vm_page_sleep(),alc2010-05-021-0/+6
| | | | | | | | | | to unconditionally set PG_REFERENCED on a page before sleeping. In many cases, it's perfectly ok for the page to disappear, i.e., be reclaimed by the page daemon, before the caller to vm_page_sleep() is reawakened. Instead, we now explicitly set PG_REFERENCED in those cases where having the page persist until the caller is awakened is clearly desirable. Note, however, that setting PG_REFERENCED on the page is still only a hint, and not a guarantee that the page should persist.
* Unlock page lock instead of recursively locking it.kib2010-04-301-3/+3
|
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-4/+58
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Setting PG_REFERENCED on a page at the end of vm_fault() is redundant sincealc2010-04-281-1/+0
| | | | | | the page table entry's accessed bit is either preset by the immediately preceding call to pmap_enter() or by hardware (or software) upon return from vm_fault() when the faulting access is restarted.
* When OOM searches for a process to kill, ignore the processes alreadykib2010-04-061-6/+15
| | | | | | | | | | | | | | | killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
* Properly synchronize the previous change.alc2009-11-281-0/+2
|
* Support the new VM_PROT_COPY option on wired pages. The effect of whichalc2009-11-271-3/+6
| | | | | is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.
* Simplify the invocation of vm_fault(). Specifically, eliminate the flagalc2009-11-271-8/+11
| | | | | | | VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
* Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE hasalc2009-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks
* Simplify both the invocation and the implementation of vm_fault() for wiringalc2009-11-181-32/+13
| | | | | | | | | | pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib
* Eliminate an unnecessary #include. (This #include should have been removedalc2009-11-041-1/+0
| | | | in r188331 when vnode_pager_lock() was eliminated.)
* Eliminate a bit of hackery from vm_fault(). The operations that thisalc2009-11-031-11/+0
| | | | | | | hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.) Reviewed by: kib
* Correct an error in vm_fault_copy_entry() that has existed since the firstalc2009-10-311-1/+1
| | | | | | | | | | | version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions. Reviewed by: kib
* When protection of wired read-only mapping is changed to read-write,kib2009-10-271-16/+46
| | | | | | | | | | | | | | install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry. Reviewed by: alc MFC after: 3 weeks
* Simplify the inner loop of vm_fault_copy_entry().alc2009-10-261-13/+12
| | | | Reviewed by: kib
* Eliminate an unnecessary check from vm_fault_prefault().alc2009-10-251-2/+2
|
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-241-1/+2
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* When forking a vm space that has wired map entries, do not forget tokib2009-07-031-10/+11
| | | | | | | | | charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Eliminate unnecessary obfuscation when testing a page's valid bits.alc2009-06-071-4/+3
|
* Eliminate an incorrect comment.alc2009-05-071-2/+0
|
* Eliminate an archaic band-aid. The immediately preceding comment alreadyalc2009-04-261-5/+3
| | | | | | explains why the band-aid is unnecessary. Suggested by: tegge
* Allow valid pages to be mapped for read access when they have a non-zeroalc2009-04-191-1/+0
| | | | | | | | | | | busy count. Only mappings that allow write access should be prevented by a non-zero busy count. (The prohibition on mapping pages for read access when they have a non- zero busy count originated in revision 1.202 of i386/i386/pmap.c when this code was a part of the pmap.) Reviewed by: tegge
* Prior to r188331 a map entry's last read offset was only updated by a hardalc2009-02-251-3/+7
| | | | | | fault. In r188331 this update was relocated because of synchronization changes to a place where it would occur on both hard and soft faults. This change again restricts the update to hard faults.
* Avoid some cases of unnecessary page queues locking by vm_fault's delete-alc2009-02-091-5/+11
| | | | behind heuristic.
* Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by aalc2009-02-081-3/+0
| | | | | | redundant assertion in vm_fault(). Reviewed by: kib
OpenPOWER on IntegriCloud