summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_fault.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r286086:kib2015-08-061-22/+17
| | | | Do not pretend that vm_fault(9) supports unwiring the address.
* MFC r282128:kib2015-05-051-0/+4
| | | | | Do not sleep waiting for the MAP_ENTRY_IN_TRANSITION state ending with the vnode locked.
* MFC r280238alc2015-04-021-0/+9
| | | | | | | Fix the root cause of the "vm_reserv_populate: reserv <address> is already promoted" panics. PR: 198163
* MFC r277828:kib2015-02-111-2/+4
| | | | | | | | | | | | | Update mtime for tmpfs files modified through memory mapping. MFC r277969: Update both ctime and mtime for writes to tmpfs files. MFC r277972: Remove single-use boolean. MFC r278151: Remove duplicated assignment.
* MFC r277055:kib2015-01-191-4/+0
| | | | Revert r263475: TDP_DEVMEMIO no longer needed.
* MFC r272907:kib2014-10-131-41/+67
| | | | | Make MAP_NOSYNC handling in the vm_fault() read-locked object path compatible with write-locked path.
* Fix a leak of the wired pages when unwiring of the PROT_NONE-mappedkib2014-09-011-62/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level. All commits below are by alc. MFC r268327: Introduce pmap_unwire(). MFC r268591: Implement pmap_unwire() for powerpc. MFC r268776: Implement pmap_unwire() for arm. MFC r268806: pmap_unwire(9) man page. MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed. MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic. MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.) MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used. MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized. MFC r269485: Retire pmap_change_wiring(). Reviewed by: alc
* MFC r270011:kib2014-08-251-4/+51
| | | | | | | | | Implement 'fast path' for the vm page fault handler. MFC r270387 (by alc): Relax one of the conditions for mapping a page on the fast path. Approved by: re (gjb)
* MFC r261647 (by alc):kib2014-08-251-1/+4
| | | | Don't call vm_fault_prefault() on zero-fill faults.
* MFC r261412 (by alc):kib2014-08-251-24/+32
| | | | Make prefaulting more aggressive on hard faults.
* MFC r269978 (by alc):kib2014-08-251-2/+3
| | | | Avoid pointless (but harmless) actions on unmanaged pages.
* Merge the changes to pmap_enter(9) for sleep-less operation (requestedkib2014-08-241-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head. MFC r269368 (by alc): Retire PVO_EXECUTABLE. MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728. MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation. MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
* MFC r266491:kib2014-05-241-12/+10
| | | | Remove redundand loop.
* MFC r265843:kib2014-05-171-40/+68
| | | | | | | | For the upgrade case in vm_fault_copy_entry(), when the entry does not need COW and is writeable, do not create a new backing object for the entry. MFC r265887: Fix locking.
* MFC r265002:kib2014-05-041-8/+15
| | | | | Fix vm_fault_copy_entry() operation on upgrade; allow it to find the pages in the shadowed objects.
* MFC r263475:kib2014-03-281-0/+4
| | | | | | | | | | | | | | | | Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
* MFC r258039:kib2013-12-171-3/+3
| | | | | | | | Avoid overflow for the page counts. MFC r258365: Revert back to use int for the page counts. Rearrange the checks to correctly handle overflowing address arithmetic.
* Remove zero-copy sockets code. It only worked for anonymous memory,kib2013-09-161-19/+1
| | | | | | | | | | | | | and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor. Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned. Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)
* On all the architectures, avoid to preallocate the physical memoryattilio2013-08-091-3/+5
| | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-26/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* Revert r253939:attilio2013-08-051-8/+5
| | | | | | | | | | | | | We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
* The page hold mechanism is fast but it has couple of fallouts:attilio2013-08-041-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
* The vm_fault() should not be allowed to proceed on the map entry whichkib2013-07-111-0/+13
| | | | | | | | | | | | | | | | | is being wired now. The entry wired count is changed to non-zero in advance, before the map lock is dropped. This makes the vm_fault() to perceive the entry as wired, and breaks the fragment which moves the wire count from the shadowed page, to the upper page, making the code unwiring non-wired page. On the other hand, the vm_fault() calls from vm_fault_wire() should be allowed to proceed, so only drain MAP_ENTRY_IN_TRANSITION from vm_fault() when wiring_thread is not current. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Acquire read lock on the src object for vm_fault_copy_entry().attilio2013-05-221-4/+4
| | | | | Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Relax the object locking in vm_fault_prefault(). A read lock suffices.alc2013-05-171-5/+5
| | | | | Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-2/+2
| | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-39/+39
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-3/+3
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* - Add system wide page faults requiring I/O counter.zont2013-01-281-2/+3
| | | | | Reviewed by: alc MFC after: 2 weeks
* In the past four years, we've added two new vm object types. Each time,alc2012-12-091-2/+2
| | | | | | | | | | | | | | | | | | similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib
* Replace the single, global page queues lock with per-queue locks on thealc2012-11-131-1/+1
| | | | | | active and inactive paging queues. Reviewed by: kib
* Commit the actual text provided by Alan, instead of the wrong updatekib2012-10-241-5/+7
| | | | | | in r242011. MFC after: 1 week
* Dirty the newly copied anonymous pages after the wired region iskib2012-10-241-3/+6
| | | | | | | | | | forked. Otherwise, pagedaemon might reclaim the page without saving its content into the swap file, resulting in the valid content replaced by zeroes. Reported and tested by: pho Reviewed and comment update by: alc MFC after: 1 week
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-21/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Calculate the count of per-process cow faults. Export the count tokib2012-05-231-0/+1
| | | | | | | userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week
* Give vm_fault()'s sequential access optimization a makeover.alc2012-05-101-68/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks
* Add new ktrace records for the start and end of VM faults. This givesjhb2012-04-051-2/+19
| | | | | | | | | | a pair of records similar to syscall entry and return that a user can use to determine how long page faults take. The new ktrace records are enabled via the 'p' trace type, and are enabled in the default set of trace points. Reviewed by: kib MFC after: 2 weeks
* Handle spurious page faults that may occur in no-fault sections of thealc2012-03-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week
* Use the trick of performing the atomic operation on the contained alignedkib2011-09-281-10/+2
| | | | | | | | | | | word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-3/+1
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Add a facility to disable processing page faults. When activated,kib2011-07-091-0/+16
| | | | | | | | uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
* Revert to using the page queues lock in vm_page_clear_dirty_mask() onalc2011-06-231-2/+1
| | | | | MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)
* Precisely document the synchronization rules for the page's dirty field.alc2011-06-191-0/+10
| | | | | | | | | | | | | (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib
* Handle the corner case in vm_fault_quick_hold_pages().kib2011-03-251-0/+2
| | | | | | | | | | If supplied length is zero, and user address is invalid, function might return -1, due to the truncation and rounding of the address. The callers interpret the situation as EFAULT. Instead of handling the zero length in caller, filter it in vm_fault_quick_hold_pages(). Sponsored by: The FreeBSD Foundation Reviewed by: alc
* For some time now, the kernel and kmem objects have been ordinaryalc2011-01-151-4/+1
| | | | | | | | | OBJT_PHYS objects. Thus, there is no need for handling them specially in vm_fault(). In fact, this special case handling would have led to an assertion failure just before the call to pmap_enter(). Reviewed by: kib@ MFC after: 6 weeks
* Correct a typo in vm_fault_quick_hold_pages().alc2010-12-281-1/+1
| | | | Reported by: Bartosz Stec
* Retire vm_fault_quick(). It's no longer used.alc2010-12-251-18/+0
| | | | Reviewed by: kib@
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-251-0/+75
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Introduce vm_fault_hold() and use it to (1) eliminate a long-standing racealc2010-12-201-5/+18
| | | | | | | | | | condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks
* Replace pointer to "struct uidinfo" with pointer to "struct ucred"trasz2010-12-021-5/+5
| | | | | | | | | in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation
OpenPOWER on IntegriCloud