summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_object.c
Commit message (Collapse)AuthorAgeFilesLines
* When unwiring a region of an address space, do not assume that thealc2014-07-261-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | underlying physical pages are mapped by the pmap. If, for example, the application has performed an mprotect(..., PROT_NONE) on any part of the wired region, then those pages will no longer be mapped by the pmap. So, using the pmap to lookup the wired pages in order to unwire them doesn't always work, and when it doesn't work wired pages are leaked. To avoid the leak, introduce and use a new function vm_object_unwire() that locates the wired pages by traversing the object and its backing objects. At the same time, switch from using pmap_change_wiring() to the recently introduced function pmap_unwire() for unwiring the region's mappings. pmap_unwire() is faster, because it operates a range of virtual addresses rather than a single virtual page at a time. Moreover, by operating on a range, it is superpage friendly. It doesn't waste time performing unnecessary demotions. Reported by: markj Reviewed by: kib Tested by: pho, jmg (arm) Sponsored by: EMC / Isilon Storage Division
* Correct assertion. The shadowing object cannot be tmpfs vm object,kib2014-07-241-2/+4
| | | | | | | | | and tmpfs object cannot shadow. In other words, tmpfs vm object is always at the bottom of the shadow chain. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week
* The OBJ_TMPFS flag of vm_object means that there is unreclaimed tmpfskib2014-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | vnode for the tmpfs node owning this object. The flag is currently used for two purposes. First, it allows to correctly handle VV_TEXT for tmpfs vnode when the ref count on the object is decremented to 1, similar to vnode_pager_dealloc() for regular filesystems. Second, it prevents some operations, which are done on OBJT_SWAP vm objects backing user anonymous memory, but are incorrect for the object owned by tmpfs node. The second kind of use of the OBJ_TMPFS flag is incorrect, since the vnode might be reclaimed, which clears the flag, but vm object operations must still be disallowed. Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test whether vm object collapse and similar actions should be disabled. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Rename global cnt to vm_cnt to avoid shadowing.bdrewery2014-03-221-1/+1
| | | | | | | | | | | | | | To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
* Do not vdrop() the tmpfs vnode until it is unlocked. The holdkib2014-03-121-1/+2
| | | | | | | reference might be the last, and then vdrop() would free the vnode. Reported and tested by: bdrewery MFC after: 1 week
* Fix-up r254141: in the process of making a failing vm_page_rename()attilio2014-02-141-2/+4
| | | | | | | | | | | | a call of pager_swap_freespace() was moved around, now leading to freeing the incorrect page because of the pindex changes after vm_page_rename(). Get back to use the correct pindex when destroying the swap space. Sponsored by: EMC / Isilon storage division Reported by: avg Tested by: pho MFC after: 7 days
* Fix function name in KASSERT().glebius2014-02-121-2/+1
| | | | Submitted by: hiren
* Do not coalesce if the swap object belongs to tmpfs vnode. Thekib2013-11-051-2/+3
| | | | | | | | | | | | | | | | coalesce would extend the object to keep pages for the anonymous mapping created by the process. The pages has no relations to the tmpfs file content which could be written into the corresponding range, causing anonymous mapping and file content aliasing and subsequent corruption. Another lesser problem created by coalescing is over-accounting on the tmpfs node destruction, since the object size is substracted from the total count of the pages owned by the tmpfs mount. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Drain for the xbusy state for two places which potentially dokib2013-09-081-0/+6
| | | | | | | | | | | | pmap_remove_all(). Not doing the drain allows the pmap_enter() to proceed in parallel, making the pmap_remove_all() effects void. The race results in an invalidated page mapped wired by usermode. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (glebius)
* Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).kib2013-08-221-2/+1
| | | | | | | | The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
* On all the architectures, avoid to preallocate the physical memoryattilio2013-08-091-27/+49
| | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-1/+1
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Never remove user-wired pages from an object when doingkib2013-07-111-9/+10
| | | | | | | | | | | | | | | msync(MS_INVALIDATE). The vm_fault_copy_entry() requires that object range which corresponds to the user-wired vm_map_entry, is always fully populated. Add OBJPR_NOTWIRED flag for vm_object_page_remove() to request the preserving behaviour, use it when calling vm_object_page_remove() from vm_object_sync(). Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* - Add a general purpose resource allocator, vmem, from NetBSD. It wasjeff2013-06-281-6/+0
| | | | | | | | | | | | | | originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Revise the interface between vm_object_madvise() and vm_page_dontneed() soalc2013-06-101-22/+2
| | | | | | | that pointless calls to pmap_is_modified() can be easily avoided when performing madvise(..., MADV_FREE). Sponsored by: EMC / Isilon Storage Division
* In vm_object_split(), busy and consequently unbusy the pages only whenattilio2013-06-041-3/+4
| | | | | | | | swap_pager_copy() is invoked, otherwise there is no reason to do so. This will eliminate the necessity to busy pages most of the times. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* After the object lock was dropped, the object' reference count couldkib2013-05-301-5/+5
| | | | | | | | | | | change. Retest the ref_count and return from the function to not execute the further code which assumes that ref_count == 1 if it is not. Also, do not leak vnode lock if other thread cleared OBJ_TMPFS flag meantime. Reported by: bdrewery Tested by: bdrewery, pho Sponsored by: The FreeBSD Foundation
* Remove the capitalization in the assertion message. Print the addresskib2013-05-301-1/+1
| | | | of the object to get useful information from optimizated kernels dump.
* Rework the handling of the tmpfs node backing swap object and tmpfskib2013-04-281-1/+23
| | | | | | | | | | | | | | | | | | vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month
* Make vm_object_page_clean() and vm_mmap_vnode() tolerate the vnode'kib2013-04-281-1/+6
| | | | | | | | | | | | | | | | | v_object of non OBJT_VNODE type. For vm_object_page_clean(), simply do not assert that object type must be OBJT_VNODE, and add a comment explaining how the check for OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such objects. For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it to be for swap pager (or default), handle the bypass filesystems, and correctly acquire the object reference in this case. Reviewed by: alc Tested by: pho, bf MFC after: 1 week
* Introduce vm_radix_is_empty(), and use it in place ofalc2013-03-101-1/+1
| | | | | | | vm_object_cache_is_empty() where the caller is aware of the page cache's implementation as a radix trie. Sponsored by: EMC / Isilon Storage Division
* Merge from vmcontention.attilio2013-03-091-103/+104
|\
| * MFCattilio2013-03-091-102/+105
| |\
| | * Switch the vm_object mutex to be a rwlock. This will enable in theattilio2013-03-091-103/+104
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
| | | * MFCattilio2013-03-081-5/+5
| | | |\
| | | * | Fix compiling.attilio2013-02-261-2/+2
| | | | |
| | | * | MFCattilio2013-02-261-4/+4
| | | | |
| | | * | MFCattilio2013-02-261-2/+2
| | | | |
| | | * | MFCattilio2013-02-261-1/+1
| | | | |
| | | * | As VM_OBJECT_SLEEP() is a vm_object_t specific function, makeattilio2013-02-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the passed object as the first argument of the function for consistency. Sponsored by: EMC / Isilon storage revision
| | | * | Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
| | | * | Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-78/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
| | | * | Switch vm_object lock to be a rwlock.attilio2013-02-201-33/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
| | * | | Merge from vmc-playground:attilio2013-03-091-5/+6
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache. Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide
| * | | MFCattilio2013-03-041-3/+5
| |\ \ \ | | |/ /
| | * | Merge from vmcontention:attilio2013-03-041-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As vm objects are type-stable there is no need to initialize the resident splay tree pointer and the cache splay tree pointer in _vm_object_allocate() but this could be done in the init UMA zone handler. The destructor UMA zone handler, will further check if the condition is retained at every destruction and catch for bugs. Sponsored by: EMC / Isilon storage division Submitted by: alc
* | | | Evaluations on the likelyhood of empty object cache cannot be made inattilio2013-03-041-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | general way but must be evaluated case by case. Embedd the decision in the caller themselves rather than in a general purpose KPI. Sponsored by: EMC / Isilon storage division Reported by: alc Reviewed by: alc
* | | | Remove the boot-time cache support and rely on UMA boot-time slab cacheattilio2013-03-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for allocating the nodes before to have the possibility to carve directly from the UMA subsystem. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* | | | We don't need to reinitialize the root of the page cache trie on everyalc2013-03-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm object allocation. We can, instead, rely on the type stability of the vm object zone. (Note that we already assert that the page cache trie is empty in the vm object zone destructor.) Sponsored by: EMC / Isilon Storage Division
* | | | Merge from vmcontentionattilio2013-03-031-1/+0
|\ \ \ \ | |/ / /
| * | | MFCattilio2013-03-031-1/+0
| |\ \ \ | | |/ /
| | * | The value held by the vm object's field pg_color is only consideredalc2013-03-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | valid if the flag OBJ_COLORED is set. Since _vm_object_allocate() doesn't set this flag, it needn't initialize pg_color. Sponsored by: EMC / Isilon Storage Division
| | * | Merge from vmc-playground branch:attilio2013-02-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the sub-optimal uma_zone_set_obj() primitive with more modern uma_zone_reserve_kva(). The new primitive reserves before hand the necessary KVA space to cater the zone allocations and allocates pages with ALLOC_NOOBJ. More specifically: - uma_zone_reserve_kva() does not need an object to cater the backend allocator. - uma_zone_reserve_kva() can cater M_WAITOK requests, in order to serve zones which need to do uma_prealloc() too. - When possible, uma_zone_reserve_kva() uses directly the direct-mapping by uma_small_alloc() rather than relying on the KVA / offset combination. The removal of the object attribute allows 2 further changes: 1) _vm_object_allocate() becomes static within vm_object.c 2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by direct calls to mtx_init() as there is no need to export it anymore and the calls aren't either homogeneous anymore: there are now small differences between arguments passed to mtx_init(). Sponsored by: EMC / Isilon storage division Reviewed by: alc (which also offered almost all the comments) Tested by: pho, jhb, davide
| | * | Remove white spaces.attilio2013-02-261-2/+2
| | | | | | | | | | | | | | | | Sponsored by: EMC / Isilon storage division
| * | | MFCattilio2013-02-261-4/+4
| | | |
| * | | MFCattilio2013-02-261-1/+1
| | | |
* | | | Revert white space change in the previous commit.alc2013-03-021-1/+0
| | | | | | | | | | | | | | | | Requested by: attilio
* | | | Assert that the trie is empty when a vm object is destroyed.alc2013-03-021-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since vm objects are allocated from type-stable memory, we don't need to initialize the trie's root in _vm_object_allocate() on every vm object allocation. We can instead do it once in vm_object_zinit(). We don't need to call vm_radix_reclaim_allnodes() in vm_object_terminate() unless the resident page count is non-zero. Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* | | | Merge from vmcontentionattilio2013-02-261-1/+1
| | | |
OpenPOWER on IntegriCloud