summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Rename global cnt to vm_cnt to avoid shadowing.bdrewery2014-03-2215-103/+104
| | | | | | | | | | | | | | To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
* Fix two issues with /dev/mem access on amd64, both causing kernel pagekib2014-03-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Initialize vm_map_entry member wiring_thread on the map entry creation.kib2014-03-211-0/+1
| | | | | | | | | This was missed in r253190. Reported by: hps, peter Tested by: hps Sponsored by: The FreeBSD Foundation MFC after: 3 days
* vm_page_grab() and vm_pager_get_pages() can drop the vm_object lock,attilio2014-03-191-2/+2
| | | | | | | | | | | then threads can sleep on the pip condition. Avoid to deadlock such threads by correctly awakening the sleeping ones after the pip is finished. swapoff side of the bug can likely result in shutdown deadlocks. Sponsored by: EMC / Isilon Storage Division Reported by: pho, pluknet Tested by: pho
* Update kernel inclusions of capability.h to use capsicum.h instead; somerwatson2014-03-161-1/+1
| | | | | | | | further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
* Initialize paddr to handle the case of zero size.kib2014-03-121-0/+1
| | | | | Reported and reviewed by: Conrad Meyer <cemeyer@uw.edu> MFC after: 1 week
* Do not vdrop() the tmpfs vnode until it is unlocked. The holdkib2014-03-121-1/+2
| | | | | | | reference might be the last, and then vdrop() would free the vnode. Reported and tested by: bdrewery MFC after: 1 week
* After r251709, avoid a clang 3.4 warning about an unused static constdim2014-02-141-4/+1
| | | | | | | variable (uma_max_ipers), when asserts are disabled. Reviewed by: glebius MFC after: 3 days
* Fix-up r254141: in the process of making a failing vm_page_rename()attilio2014-02-141-2/+4
| | | | | | | | | | | | a call of pager_swap_freespace() was moved around, now leading to freeing the incorrect page because of the pindex changes after vm_page_rename(). Get back to use the correct pindex when destroying the swap space. Sponsored by: EMC / Isilon storage division Reported by: avg Tested by: pho MFC after: 7 days
* Fix function name in KASSERT().glebius2014-02-121-2/+1
| | | | Submitted by: hiren
* Correct assertion to assert that the existing device VM object uses thejhb2014-02-111-2/+4
| | | | | | | same type rather than asserting in the case where we just created a new VM object. Reviewed by: kib
* Create two public UMA_ZONE_PCPU zones: 64 bit sized and pointer sized.glebius2014-02-101-0/+6
| | | | Sponsored by: Nginx, Inc.
* Style.glebius2014-02-101-3/+3
|
* Make M_ZERO flag work correctly on UMA_ZONE_PCPU zones.glebius2014-02-101-2/+14
| | | | Sponsored by: Nginx, Inc.
* Don't call vm_fault_prefault() on zero-fill faults. It's a waste of time.alc2014-02-091-1/+4
| | | | Successful prefaults after a zero-fill fault are extremely rare.
* Provide macros that allow easily export uma(9) zone limits andglebius2014-02-071-0/+29
| | | | | | | | | | | current usage via sysctl(9): SYSCTL_UMA_MAX() SYSCTL_ADD_UMA_MAX() SYSCTL_UMA_CUR() SYSCTL_ADD_UMA_CUR() Sponsored by: Nginx, Inc.
* Make prefaulting more aggressive on hard faults. Previously, we would onlyalc2014-02-021-24/+32
| | | | | | | | | | map a fraction of the pages that were fetched by vm_pager_get_pages() from secondary storage. Now, we map them all in order to avoid future soft faults. This effect is most evident when a memory-mapped file is accessed sequentially. Previously, there were 6 soft faults for every hard fault. Now, these soft faults are eliminated. Sponsored by: EMC / Isilon Storage Division
* In an effort to diagnose possible corruption of struct vm_page on somealc2014-01-241-2/+2
| | | | | sparc64 machines make the page queue assert in vm_page_dequeue() more precise. While I'm here switch the page lock assert to the newer style.
* Fix a couple of typos.jhb2014-01-211-2/+2
|
* ANSIfy declarations.glebius2014-01-201-32/+11
| | | | Ok'ed by: alc
* Style changes in vm_pageout_scan():alc2014-01-181-12/+11
| | | | | | | | | | | | | | | | 1. Be consistent in the style of "act_delta" manipulations between the inactive and active queue scans. 2. Explicitly compare to zero. 3. The deactivation of a page is based is based on its recent history and not just the current call to vm_pageout_scan(). The variable "act_delta" represents the current state of the page, and not its history. Avoid possible confusion by not (ab)using "act_delta" for the making the deactivation decision. Submitted by: kib [1] Reviewed by: kib [2,3]
* Correctly update the count of stuck pages, "addl_page_shortage", inalc2014-01-121-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | vm_pageout_scan(). There were missing increments in two less common cases. Don't conflate the count of stuck pages and the pageout deficit provided by vm_page_alloc{,_contig}(). (A proposed fix to the OOM code depends on this.) Handle held pages consistently in the inactive queue scan. In the more common case, we did not move the page to the tail of the queue. Whereas, in the less common case, we did. There's no particular reason to move the page in the less common case, so remove it. Perform the calculation of the page shortage for the active queue scan a little earlier, before the active queue lock is acquired. The correctness of this calculation doesn't depend on the active queue lock being held. Eliminate a redundant variable, "pcount". Use the more descriptive variable, "maxscan", in its place. Apply a few nearby style fixes, e.g., eliminate stray whitespace and excess parentheses. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division
* Since the introduction of the popmap to reservations in r259999, there isalc2013-12-313-32/+14
| | | | | | | | | | | | | | | | no longer any need for the page's PG_CACHED and PG_FREE flags to be set and cleared while the free page queues lock is held. Thus, vm_page_alloc(), vm_page_alloc_contig(), and vm_page_alloc_freelist() can wait until after the free page queues lock is released to clear the page's flags. Moreover, the PG_FREE flag can be retired. Now that the reservation system no longer uses it, its only uses are in a few assertions. Eliminating these assertions is no real loss. Other assertions catch the same types of misbehavior, like doubly freeing a page (see r260032) or dirtying a free page (free pages are invalid and only valid pages can be dirtied). Eliminate an unneeded variable from vm_page_alloc_contig(). Sponsored by: EMC / Isilon Storage Division
* Add "popmap" assertions: The page being freed isn't already free, and thealc2013-12-291-0/+6
| | | | | | page being allocated isn't already allocated. Sponsored by: EMC / Isilon Storage Division
* MFp4 alc_popmapalc2013-12-281-88/+185
| | | | | | | | | Change the way that reservations keep track of which pages are in use. Instead of using the page's PG_CACHED and PG_FREE flags, maintain a bit vector within the reservation. This approach has a couple benefits. First, it makes breaking reservations much cheaper because there are fewer cache misses to identify the unused pages. Second, it is a pre- requisite for supporting two or more reservation sizes.
* Do not coalesce stack entry, vm_map_stack() asserts that the requestedkib2013-12-271-1/+1
| | | | | | | | | | | | | | region is claimed by a new entry. Pass MAP_STACK_GROWS_DOWN and MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack(), to really turn off coalescing code and call to vm_map_simplify_entry() [1]. Reported by: avg, peter, many Tested by: avg, peter Noted by: avg [1] Sponsored by: The FreeBSD Foundation MFC after: 1 week
* For ia64, use pmap_remove_pages() and not pmap_remove(). The problem ismarcel2013-12-261-0/+10
| | | | | | | | | | that we don't have a good way (yet) to iterate over the mapped pages by virtual address and simply try each page within the range. Given that we call pmap_remove() over the entire 2^63 bytes of address space, it takes a while for pmap_remove to have tried all 2^50 pages. By using pmap_remove_pages() we use the PV list to find all mappings. Change derived from a patch by: alc
* In sys/vm/vm_pageout.c, since vm_pageout_worker() takes a void * asdim2013-12-251-1/+1
| | | | | | | | argument, cast the incoming 0 argument to void *, to silence a warning from clang 3.4 ("expression which evaluates to zero treated as a null pointer constant of type 'void *' [-Wnon-literal-null-conversion]"). MFC after: 3 days
* Eliminate a redundant parameter to vm_radix_replace().alc2013-12-083-10/+7
| | | | | | | | Improve the wording of the comment describing vm_radix_replace(). Reviewed by: attilio MFC after: 6 weeks Sponsored by: EMC / Isilon Storage Division
* In keg_dtor(), print out the keg name in the "Freed UMA keg was not empty"rodrigc2013-11-291-1/+2
| | | | | | | message printed to the console. This makes it easier to track down the source of certain memory leaks. Suggested by: adrian
* - Add bucket size column to `show uma` DDB command.mav2013-11-281-5/+34
| | | | - Add `show umacache` command to show alike stats for cache-only UMA zones.
* Make UMA to not blindly force offpage slab header allocation for largemav2013-11-271-2/+16
| | | | | | | | | | | (> PAGE_SIZE) zones. If zone is not multiple to PAGE_SIZE, there may be enough space for the header at the last page, so we may avoid extra header memory allocation and hash table update/lookup. ZFS creates bunch of odd-sized UMA zones (5120, 6144, 7168, 10240, 14336). This change gives good use to at least some of otherwise lost memory there. Reviewed by: avg
* Don't count bucket allocation failures for UMA zones as their own failures.mav2013-11-271-5/+3
| | | | | | | There are good reasons for this to happen, such as recursion prevention, etc. and they are not fatal since buckets are just an optimization mechanism. Real bucket allocation failures are any way counted by the bucket zones themselves, and we don't need double accounting there.
* Fix bug introduced at r252226, when udata argument passed to bucket_alloc()mav2013-11-271-3/+4
| | | | | | | was used without making sure first that it was really passed for us. On some of my systems this bug made user argument passed by ZFS code to uma_zalloc_arg() unexpectedly block UMA per-CPU caches for those zones.
* When purging per-CPU UMA caches do not return empty buckets into the globalmav2013-11-231-4/+16
| | | | | full bucket cache to not trigger assertion if allocation happen before that global cache get purged.
* Vm map code performs clipping when map entry covers region which iskib2013-11-201-0/+15
| | | | | | | | | | | | | | | | | | | | | larger than the operational region. If the op region size is zero, clipping would create a zero-sized map entry. The result is that vm map splay starts behaving inconsistently, sometimes returning zero-sized entry, sometimes the next (or previous) entry. One step further, it could result in e.g. vm_map_wire() setting MAP_ENTRY_IN_TRANSITION on the zero-sized entry, but failing to clear it in the done part. The vm_map_delete() than hangs forever waiting for the flag removal. Verify for zero-length requests and act as if it is always successfull without performing any action on the address space. Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add assertions to cover all places in the wiring and unwiring codekib2013-11-201-1/+9
| | | | | | | | | where MAP_ENTRY_IN_TRANSITION is set or cleared. Tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Revert back to use int for the page counts. In vn_io_fault(), the i/okib2013-11-201-4/+4
| | | | | | | | | | | | is chunked to pieces limited by integer io_hold_cnt tunable, while vm_fault_quick_hold_pages() takes integer max_count as the upper bound. Rearrange the checks to correctly handle overflowing address arithmetic. Submitted by: bde Tested by: pho Discussed with: alc MFC after: 1 week
* Implement mechanism to safely but slowly purge UMA per-CPU caches.mav2013-11-191-0/+77
| | | | | | This is a last resort for very low memory condition in case other measures to free memory were ineffective. Sequentially cycle through all CPUs and extract per-CPU cache buckets into zone cache from where they can be freed.
* Grow UMA zone bucket size also on lock congestion during item free.mav2013-11-191-2/+13
| | | | | | Lock congestion is the same, whether it happens on alloc or free, so handle it equally. Now that we have back pressure, there is no problem to grow buckets a bit faster. Any way growth is much slower then in 9.x.
* Add two new UMA bucket zones to store 3 and 9 items per bucket.mav2013-11-191-0/+2
| | | | | | | | | | These new buckets make bucket size self-tuning more soft and precise. Without them there are buckets for 1, 5, 13, 29, ... items. While at bigger sizes difference about 2x is fine, at smallest ones it is 5x and 2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13, 29, reducing jumps between steps, making algorithm work softer, allocating and freeing memory in better fitting chunks. Otherwise there is quite a big gap between allocating 128K and 5x128K of RAM at once.
* Implement soft pressure on UMA cache bucket sizes.mav2013-11-192-1/+11
| | | | | | | | | | | Every time system detects low memory condition decrease bucket sizes for each zone by one item. As result, higher memory pressure will push to smaller bucket sizes and so smaller per-CPU caches and so more efficient memory use. Before this change there was no force to oppose buckets growth as result of practically inevitable zone lock conflicts, and after some run time per-CPU caches could consume enough RAM to kill the system.
* Avoid overflow for the page counts.kib2013-11-121-1/+1
| | | | | | Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* If filesystem declares that it supports shared locking for writes, usekib2013-11-091-2/+8
| | | | | | | | | | | | | shared vnode lock for VOP_PUTPAGES() as well. The only such filesystem in the tree is ZFS, and it uses vnode_pager_generic_putpages(), which performs the pageout with VOP_WRITE(). Reviewed by: alc Discussed with: avg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Do not coalesce if the swap object belongs to tmpfs vnode. Thekib2013-11-051-2/+3
| | | | | | | | | | | | | | | | coalesce would extend the object to keep pages for the anonymous mapping created by the process. The pages has no relations to the tmpfs file content which could be written into the corresponding range, causing anonymous mapping and file content aliasing and subsequent corruption. Another lesser problem created by coalescing is over-accounting on the tmpfs node destruction, since the object size is substracted from the total count of the pages owned by the tmpfs mount. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Tidy up the output of "sysctl vm.phys_free".alc2013-10-101-5/+3
| | | | | Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
* Both the vm_map and vmspace zones are defined as "no free". So, there is noalc2013-09-221-23/+2
| | | | | | | | point in defining a fini function for these zones. Reviewed by: kib Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
* Merge the following changes from projects/bhyve_npt_pmap:neel2013-09-202-6/+15
| | | | | | | | | | | | - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)
* The pmap function pmap_clear_reference() is no longer used. Remove it.alc2013-09-201-1/+0
| | | | | | | | | pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
* Extend the support for exempting processes from being killed when swap isjhb2013-09-191-10/+7
| | | | | | | | | | | | | | | | | | | | | | exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
OpenPOWER on IntegriCloud