summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_fault.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r266491:kib2014-05-241-12/+10
| | | | Remove redundand loop.
* MFC r265843:kib2014-05-171-40/+68
| | | | | | | | For the upgrade case in vm_fault_copy_entry(), when the entry does not need COW and is writeable, do not create a new backing object for the entry. MFC r265887: Fix locking.
* MFC r265002:kib2014-05-041-8/+15
| | | | | Fix vm_fault_copy_entry() operation on upgrade; allow it to find the pages in the shadowed objects.
* MFC r263475:kib2014-03-281-0/+4
| | | | | | | | | | | | | | | | Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
* MFC r258039:kib2013-12-171-3/+3
| | | | | | | | Avoid overflow for the page counts. MFC r258365: Revert back to use int for the page counts. Rearrange the checks to correctly handle overflowing address arithmetic.
* Remove zero-copy sockets code. It only worked for anonymous memory,kib2013-09-161-19/+1
| | | | | | | | | | | | | and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor. Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned. Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)
* On all the architectures, avoid to preallocate the physical memoryattilio2013-08-091-3/+5
| | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-26/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* Revert r253939:attilio2013-08-051-8/+5
| | | | | | | | | | | | | We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
* The page hold mechanism is fast but it has couple of fallouts:attilio2013-08-041-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
* The vm_fault() should not be allowed to proceed on the map entry whichkib2013-07-111-0/+13
| | | | | | | | | | | | | | | | | is being wired now. The entry wired count is changed to non-zero in advance, before the map lock is dropped. This makes the vm_fault() to perceive the entry as wired, and breaks the fragment which moves the wire count from the shadowed page, to the upper page, making the code unwiring non-wired page. On the other hand, the vm_fault() calls from vm_fault_wire() should be allowed to proceed, so only drain MAP_ENTRY_IN_TRANSITION from vm_fault() when wiring_thread is not current. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Acquire read lock on the src object for vm_fault_copy_entry().attilio2013-05-221-4/+4
| | | | | Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Relax the object locking in vm_fault_prefault(). A read lock suffices.alc2013-05-171-5/+5
| | | | | Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-2/+2
| | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-39/+39
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-3/+3
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* - Add system wide page faults requiring I/O counter.zont2013-01-281-2/+3
| | | | | Reviewed by: alc MFC after: 2 weeks
* In the past four years, we've added two new vm object types. Each time,alc2012-12-091-2/+2
| | | | | | | | | | | | | | | | | | similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib
* Replace the single, global page queues lock with per-queue locks on thealc2012-11-131-1/+1
| | | | | | active and inactive paging queues. Reviewed by: kib
* Commit the actual text provided by Alan, instead of the wrong updatekib2012-10-241-5/+7
| | | | | | in r242011. MFC after: 1 week
* Dirty the newly copied anonymous pages after the wired region iskib2012-10-241-3/+6
| | | | | | | | | | forked. Otherwise, pagedaemon might reclaim the page without saving its content into the swap file, resulting in the valid content replaced by zeroes. Reported and tested by: pho Reviewed and comment update by: alc MFC after: 1 week
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-21/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Calculate the count of per-process cow faults. Export the count tokib2012-05-231-0/+1
| | | | | | | userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week
* Give vm_fault()'s sequential access optimization a makeover.alc2012-05-101-68/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks
* Add new ktrace records for the start and end of VM faults. This givesjhb2012-04-051-2/+19
| | | | | | | | | | a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks
* Handle spurious page faults that may occur in no-fault sections of thealc2012-03-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week
* Use the trick of performing the atomic operation on the contained alignedkib2011-09-281-10/+2
| | | | | | | | | | | word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-3/+1
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Add a facility to disable processing page faults. When activated,kib2011-07-091-0/+16
| | | | | | | | uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
* Revert to using the page queues lock in vm_page_clear_dirty_mask() onalc2011-06-231-2/+1
| | | | | MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)
* Precisely document the synchronization rules for the page's dirty field.alc2011-06-191-0/+10
| | | | | | | | | | | | | (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib
* Handle the corner case in vm_fault_quick_hold_pages().kib2011-03-251-0/+2
| | | | | | | | | | If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages(). Sponsored by: The FreeBSD Foundation Reviewed by: alc
* For some time now, the kernel and kmem objects have been ordinaryalc2011-01-151-4/+1
| | | | | | | | | OBJT_PHYS objects. Thus, there is no need for handling them specially in vm_fault(). In fact, this special case handling would have led to an assertion failure just before the call to pmap_enter(). Reviewed by: kib@ MFC after: 6 weeks
* Correct a typo in vm_fault_quick_hold_pages().alc2010-12-281-1/+1
| | | | Reported by: Bartosz Stec
* Retire vm_fault_quick(). It's no longer used.alc2010-12-251-18/+0
| | | | Reviewed by: kib@
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-251-0/+75
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Introduce vm_fault_hold() and use it to (1) eliminate a long-standing racealc2010-12-201-5/+18
| | | | | | | | | | condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks
* Replace pointer to "struct uidinfo" with pointer to "struct ucred"trasz2010-12-021-5/+5
| | | | | | | | | in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation
* Use vm_page_prev() instead of vm_page_lookup() in the implementation ofalc2010-07-021-10/+12
| | | | | vm_fault()'s automatic delete-behind heuristic. vm_page_prev() is typically faster.
* When waiting for the busy page, do not unlock the object unless unlockkib2010-05-201-3/+6
| | | | | | | cannot be avoided. Reviewed by: alc MFC after: 1 week
* Push down the acquisition of the page queues lock into vm_pageq_remove().alc2010-05-091-2/+0
| | | | | | (This eliminates a surprising number of page queues lock acquisitions by vm_fault() because the page's queue is PQ_NONE and thus the page queues lock is not needed to remove the page from a queue.)
* Minimize the scope of the page queues lock in vm_fault().alc2010-05-081-1/+2
|
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-081-12/+3
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the page queues lock into vm_page_activate().alc2010-05-071-6/+1
|
* Push down the page queues lock into vm_page_deactivate(). Eliminate analc2010-05-071-2/+0
| | | | incorrect comment.
* Eliminate page queues locking around most calls to vm_page_free().alc2010-05-061-11/+1
|
* Acquire the page lock around all remaining calls to vm_page_free() onalc2010-05-051-2/+0
| | | | | | | | | | | | | managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
* Push down the acquisition of the page queues lock into vm_page_unwire().alc2010-05-051-9/+5
| | | | | | | Update the comment describing which lock should be held on entry to vm_page_wire(). Reviewed by: kib
* Add page locking to the vm_page_cow* functions.alc2010-05-041-6/+0
| | | | | | | Push down the acquisition and release of the page queues lock into vm_page_wire(). Reviewed by: kib
* Simplify vm_fault(). The introduction of the new page lock renders a bit ofalc2010-05-021-13/+5
| | | | | | | cleverness by vm_fault() to avoid repeatedly releasing and reacquiring the page queues lock pointless. Reviewed by: kib, kmacy
OpenPOWER on IntegriCloud