summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* MFC r308733:kib2016-11-231-30/+50
| | | | Move the fast fault path into the separate function.
* MFC r308288:kib2016-11-181-0/+2
| | | | Do not sleep in vm_wait() if pagedaemon did not yet started. Panic instead.
* MFC r308174, r308261alc2016-11-061-8/+26
| | | | | | | | | | | | | Move and revise a comment about the relation between the object's paging- in-progress count and the vnode. Prior to r188331, we always acquired the vnode lock before incrementing the object's paging-in-progress count. Now, we increment it before attempting to acquire the vnode lock with LK_NOWAIT, but we never sleep acquiring the vnode lock while we have the count incremented. In vm_fault()'s loop over the shadow chain, move a comment describing our invariants to a better place. Also, add two comments concerning the relationship between the map and vnode locks.
* MFC r308114:kib2016-11-061-4/+4
| | | | Change remained internal uses of boolean_t to bool in vm/vm_fault.c.
* MFC r308113:kib2016-11-061-1/+0
| | | | Remove vm_pager_has_page() declaration.
* MFC r308109:kib2016-11-061-5/+2
| | | | Remove vnode_locked label and goto.
* MFC r308108:kib2016-11-061-1/+4
| | | | | Split long line instead of unindenting it. Add KASSERT() verifying that a device object with the same handle has the same ops vector.
* MFC r308096, r308098, r308112alc2016-11-061-14/+13
| | | | | | | | | | | | | With one exception, "hardfault" is used like a "bool". Change that exception and make it a "bool". The "lookup_still_valid" field is used as a "bool". Make it one. Convert vm_fault_hold()'s Boolean variables that are only used internally to "bool". Add a comment describing why the one remaining "boolean_t" was not converted. Merge and sort vm_fault_hold()'s "int" variable definitions.
* MFC r308094:kib2016-11-051-15/+15
| | | | | | | Add unlock_vp() helper. MFC r308095 (by markj): Add one more use of unlock_vp().
* MFC r306712alc2016-10-303-12/+9
| | | | | | Make the page daemon's notion of what kind of pass is being performed by vm_pageout_scan() local to vm_pageout_worker(). There is no reason to store the pass in the NUMA domain structure.
* MFC r306706alc2016-10-301-16/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Change vm_pageout_scan() to return a value indicating whether the free page target was met. Previously, vm_pageout_worker() itself checked the length of the free page queues to determine whether vm_pageout_scan(pass >= 1)'s inactive queue scan freed enough pages to meet the free page target. Specifically, vm_pageout_worker() used vm_paging_needed(). The trouble with vm_paging_needed() is that it compares the length of the free page queues to the wakeup threshold for the page daemon, which is much lower than the free page target. Consequently, vm_pageout_worker() could conclude that the inactive queue scan succeeded in meeting its free page target when in fact it did not; and rather than immediately triggering an all-out laundering pass over the inactive queue, vm_pageout_worker() would go back to sleep waiting for the free page count to fall below the page daemon wakeup threshold again, at which point it will perform another limited (pass == 1) scan over the inactive queue. Changing vm_pageout_worker() to use vm_page_count_target() instead of vm_paging_needed() won't work because any page allocations that happen concurrently with the inactive queue scan will result in the free page count being below the target at the end of a successful scan. Instead, having vm_pageout_scan() return a value indicating success or failure is the most straightforward fix.
* MFC r307501:kib2016-10-241-2/+8
| | | | | If vm_fault_hold(9) finds that fs.m is wired, do not free it after a pager error, leave the page to the wire owner.
* MFC r307499:kib2016-10-242-1/+2
| | | | Export vm_page_xunbusy_maybelocked().
* MFC r307236:markj2016-10-211-0/+2
| | | | Plug a vnode lock leak in vm_fault_hold().
* MFC r307218:kib2016-10-203-16/+16
| | | | Fix a race in vm_page_busy_sleep(9).
* MFC r307064:kib2016-10-181-1/+12
| | | | | When downgrading exclusively busied page to shared-busy state, wakeup waiters.
* MFC r305056, r305367:markj2016-10-022-81/+106
| | | | Restore swap pager readahead.
* MFC r305129:kib2016-09-141-9/+22
| | | | Make swapoff reliable.
* MFC r304102alc2016-09-051-2/+3
| | | | | Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when neither vm_pager_has_page() nor vm_pager_get_pages() is called.
* MFC r304053, r304054:markj2016-08-312-3/+3
| | | | Initialize busy lock state and strengthen busy lock assertions.
* MFC r303924 (by trasz):kib2016-08-291-1/+1
| | | | Eliminate vprint().
* MFC r303747,303982alc2016-08-271-60/+35
| | | | | | | | | | Correct errors and clean up the comments on the active queue scan. Eliminate some unnecessary blank lines. Clean up the comments and code style in and around vm_pageout_cluster(). In particular, fix factual, grammatical, and spelling errors in various comments, and remove comments that are out of place in this function.
* MFC r303243markj2016-08-141-3/+5
| | | | Update a comment in vm_page_advise() to match behaviour after r290529.
* MFC r303059markj2016-08-141-3/+2
| | | | Release the second critical section in uma_zfree_arg() slightly earlier.
* MFC r303516markj2016-08-141-1/+1
| | | | Use vm_page_undirty() instead of manually setting a page field.
* MFC r303244, r303399markj2016-08-141-10/+10
| | | | De-pluralize "queues" in the pagedaemon code.
* MFC r303773alc2016-08-101-1/+1
| | | | | | Correct a spelling error. Approved by: re (kib)
* MFC r303448:kib2016-08-101-40/+30
| | | | | | | | | Do not delegate a work to geom event thread which can be done inline. MFC r303703: Explain why swapgeom_close_ev() is delegated. Approved by: re (gjb)
* MFC r303492alc2016-08-051-1/+0
| | | | | | | Remove a probe declaration that has been unused since r292469, when vm_pageout_grow_cache() was replaced. Approved by: re (gjb)
* MFC r303446:kib2016-08-041-4/+4
| | | | | | Fix style and typo. Approved by: re (gjb)
* MFC r303356 and r303465alc2016-08-011-15/+12
| | | | | | | Remove any mention of cache (PG_CACHE) pages from the comments in vm_pageout_scan(). That function has not cached pages since r284376. Approved by: re (kib)
* MFC r302567:kib2016-07-251-0/+4
| | | | | | | | | | In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called, if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate(). MFC r302580: Fix grammar. Approved by: re (gjb)
* MFC r303101alc2016-07-231-0/+9
| | | | | | Add a comment describing the 'fast path' that was introduced in r270011. Approved by: re (gjb)
* MFC r302980alc2016-07-211-39/+76
| | | | | | | | | | | | | | | | | | Break up vm_fault()'s implementation of the read-ahead and delete-behind optimizations into two distinct pieces. The first piece consists of the code that should only be performed once per page fault and requires the map to be locked. The second piece consists of the code that should be performed each time a pager is called on an object in the shadow chain. (This second piece expects the map to be unlocked.) Previously, the entire implementation could be executed multiple times. Moreover, the second and subsequent executions would occur with the map unlocked. Usually, the ensuing unsynchronized accesses to the map were harmless because the map was not changing. Nonetheless, it was possible for a use-after-free error to occur, where vm_fault() wrote to a freed map entry. This change corrects that problem. Approved by: re (gjb)
* Change the type of the map entry's next_read field from a vm_pindex_t to aalc2016-07-073-10/+10
| | | | | | | | | | | | | | | | | vm_offset_t. (This field is used to detect sequential access to the virtual address range represented by the map entry.) There are three reasons to make this change. First, a vm_offset_t is smaller on 32-bit architectures. Consequently, a struct vm_map_entry is now smaller on 32-bit architectures. Second, a vm_offset_t can be written atomically, whereas it may not be possible to write a vm_pindex_t atomically on a 32-bit architecture. Third, using a vm_pindex_t makes the next_read field dependent on which object in the shadow chain is being read from. Replace an "XXX" comment. Reviewed by: kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division
* Autotune the number of pages set aside for UMA startup based on the numbercperciva2016-07-072-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of CPUs present. On amd64 this unbreaks the boot for systems with 92 or more CPUs; the limit will vary on other systems depending on the size of their uma_zone and uma_cache structures. The major consumer of pages during UMA startup is the 19 zone structures which are set up before UMA has bootstrapped itself sufficiently to use the rest of the available memory: UMA Slabs, UMA Hash, 4 / 6 / 8 / 12 / 16 / 32 / 64 / 128 / 256 Bucket, vmem btag, VM OBJECT, RADIX NODE, MAP, KMAP ENTRY, MAP ENTRY, VMSPACE, and fakepg. If the zone structures occupy more than one page, they will not share pages and the number of pages currently needed for startup is 19 * pages_per_zone + N, where N is the number of pages used for allocating other structures; on amd64 N = 3 at present (2 pages are allocated for UMA Kegs, and one page for UMA Hash). This patch adds a new definition UMA_BOOT_PAGES_ZONES, currently set to 32, and if a zone structure does not fit into a single page sets boot_pages to UMA_BOOT_PAGES_ZONES * pages_per_zone instead of UMA_BOOT_PAGES (which remains at 64). Consequently this patch has no effect on systems where the zone structure fits into 2 or fewer pages (on amd64, 59 or fewer CPUs), but increases boot_pages sufficiently on systems where the large number of CPUs makes this structure larger. It seems safe to assume that systems with 60+ CPUs can afford to set aside an additional 128kB of memory per 32 CPUs. The vm.boot_pages tunable continues to override this computation, but is unlikely to be necessary in the future. Tested on: EC2 x1.32xlarge Relnotes: FreeBSD can now boot on 92+ CPU systems without requiring vm.boot_pages to be manually adjusted. Reviewed by: jeff, alc, adrian Approved by: re (kib)
* Replace a number of conflations of mp_ncpus and mp_maxid with eithernwhitehorn2016-07-062-3/+4
| | | | | | | | | | | | | | | | | | | mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)
* Clarify the vnode_destroy_vobject() logic handling for already terminatedkib2016-07-051-3/+14
| | | | | | | | | | | | | | | | | | | objects. Assert that there is no new waiters for the already terminated objects. Old waiters should have been notified by the termination calling vnode_pager_dealloc() (old/new are with regard of the lock acquisition interval). Only clear the vp->v_object for the case of already terminated object, since other branches call vnode_pager_dealloc(), which should clear the pointer. Assert this. Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
* Change type of the 'dead' variable to boolean.kib2016-07-031-2/+2
| | | | | | Requested by: alc MFC after: 1 week Approved by: re (gjb)
* If the vm_fault() handler raced with the vm_object_collapse()kib2016-06-271-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleepable scan, iteration over the shadow chain looking for a page could find an OBJ_DEAD object. Such state of the mapping is only transient, the dead object will be terminated and removed from the chain shortly. We must not return KERN_PROTECTION_FAILURE unless the object type is changed to OBJT_DEAD in the chain, indicating that paging on this address is really impossible. Returning KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered to processes, or kernel accesses to UVA spuriously failing with EFAULT. If the object with OBJ_DEAD flag is found, only return KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD. Otherwise, sleep a tick and retry the fault handling. Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by waiting until the paging on this object is finished. But to do so, we need to reference the dead object, while vm_object_collapse() insists on owning the final reference on the collapsed object. This could be fixed by e.g. changing the assert to shared reference release between vm_fault() and vm_object_collapse(), but it seems to be too much complications for rare boundary condition. PR: 204426 Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation X-Differential revision: https://reviews.freebsd.org/D6085 MFC after: 2 weeks Approved by: re (gjb)
* In vm_page_xunbusy_maybelocked(), add fast path for unbusy when nokib2016-06-231-4/+22
| | | | | | | | | | | | | | waiters exist, same as for vm_page_xunbusy(). If previous value of busy_lock was VPB_SINGLE_EXCLUSIVER, no waiters existed and wakeup is not needed. Move common code from vm_page_xunbusy_maybelocked() and vm_page_xunbusy_hard() to vm_page_xunbusy_locked(). Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
* Add a comment noting locking regime for vm_page_xunbusy().kib2016-06-231-0/+1
| | | | | | | Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
* Fix a LOR between vnode locks and allproc_lock.kib2016-06-222-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an order between covered vnode lock and allproc_lock, which is established by calling mountcheckdirs() while owning the covered vnode lock. mountcheckdirs() iterates over the processes, protected by allproc_lock. This order is needed and seems to be not avoidable. On the other hand, various VM daemons also need to iterate over all processes, and they lock and unlock user maps. Since unlock of the user map may trigger processing of the deferred map entries, it causes vnode locking to occur. Or, when vmspace is freed, dropping references on the vnode-backed object also lock vnodes. We get reverted order comparing with the mount/unmount order. For VM daemons, there is no need to own allproc_lock while we operate on vmspaces. If the process is held, it serves as the marker for allproc list, which allows to continue the iteration. Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in of the kernel stacks. It is used instead of _PHOLD() in vm code, since e.g. calling faultin() in OOM conditions only exaggerates the problem. Modernize comment describing PHOLD. Reported by: lists@yamagi.org Tested by: pho (previous version) Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 week Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6679
* The vmtotal sysctl handler marks active vm objects to calculatekib2016-06-212-46/+37
| | | | | | | | | | | | | | | | | | | | | | | | statistics. Marking is done by setting the OBJ_ACTIVE flag. The flags change is locked, but the problem is that many parts of system assume that vm object initialization ensures that no other code could change the object, and thus performed lockless. The end result is corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag, causing random hangs. Avoid the active object marking, instead provide equally inexact but immutable is_object_alive() definition for the object mapped state. Avoid iterating over the processes mappings altogether by using arguably improved definition of the paging thread as one which sleeps on the v_free_count. PR: 204764 Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb)
* Fix inconsistent locking of the swap pager named objects list.kib2016-06-131-63/+56
| | | | | | | | | | | | | | | | | | | | | | | | | Right now, all modifications of the list are locked by sw_alloc_mtx. But initial lookup of the object by the handle in swap_pager_alloc() is not protected by sw_alloc_mtx, which means that vm_pager_object_lookup() could follow freed pointer. Create a new named swap object with the OBJT_SWAP type, instead of OBJT_DEFAULT. With this change, swp_pager_meta_build() never need to upgrade named OBJT_DEFAULT to OBJT_SWAP (in the other place, we do not forbid for client code to create named OBJT_DEFAULT objects at all). That change allows to remove sw_alloc_mtx and make the list locked by sw_alloc_sx lock. Update swap_pager_copy() to new locking mode. Create helper swap_pager_alloc_init() to consolidate named and anonymous swap objects creation, while a caller ensures that the neccesary locks are held around the helper. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (hrs)
* Explicitely initialize sw_alloc_sx. Currently it is not initializedkib2016-06-131-0/+1
| | | | | | | | | but works due to zeroed out bss on startup. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs)
* Reset the page busy lock state after failing to insert into the object.markj2016-06-021-0/+2
| | | | | | | | Freeing a shared-busy page is not permitted. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6670
* Don't preserve the page's object linkage in vm_page_insert_after().markj2016-06-021-6/+2
| | | | | | | | | | Per the KASSERT at the beginning of the function, we expect that the page does not belong to any object, so its object and pindex fields are meaningless. Reset them in the rare case that vm_radix_insert() fails. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D6669
* Fix memguard(9) in kernels with INVARIANTS enabled.markj2016-06-012-9/+30
| | | | | | | | | | | | | With r284861, UMA zones use the trash ctor and dtor by default. This is incompatible with memguard, which frees the backing page when the item is freed. Modify the UMA debug functions to be no-ops if the item was allocated from memguard. This also fixes constructors such as mb_ctor_pack(), which invokes the trash ctor in addition to performing some initialization. Reviewed by: glebius MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D6562
* If the fast path unbusy in vm_page_replace() fails, slow path needs tokib2016-06-011-15/+20
| | | | | | | | | acquire the page lock, which recurses. Avoid the recursion by reusing the code from vm_page_remove() in a new helper vm_page_xunbusy_maybelocked(). Reviewed by: alc Sponsored by: The FreeBSD Foundation
OpenPOWER on IntegriCloud