summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_map.h
Commit message (Collapse)AuthorAgeFilesLines
* MFC r286086:kib2015-08-061-3/+3
| | | | Do not pretend that vm_fault(9) supports unwiring the address.
* MFC r267630:kib2014-06-261-0/+1
| | | | Add MAP_EXCL flag for mmap(2).
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-1/+1
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Add new mmap(2) flags to permit applications to request specific virtualjhb2013-08-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-4/+0
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Revert r253939:attilio2013-08-051-1/+0
| | | | | | | | | | | | | We cannot busy a page before doing pagefaults. Infact, it can deadlock against vnode lock, as it tries to vget(). Other functions, right now, have an opposite lock ordering, like vm_object_sync(), which acquires the vnode lock first and then sleeps on the busy mechanism. Before this patch is reinserted we need to break this ordering. Sponsored by: EMC / Isilon storage division Reported by: kib
* The page hold mechanism is fast but it has couple of fallouts:attilio2013-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | - It does not let pages respect the LRU policy - It bloats the active/inactive queues of few pages Try to avoid it as much as possible with the long-term target to completely remove it. Use the soft-busy mechanism to protect page content accesses during short-term operations (like uiomove_fromphys()). After this change only vm_fault_quick_hold_pages() is still using the hold mechanism for page content access. There is an additional complexity there as the quick path cannot immediately access the page object to busy the page and the slow path cannot however busy more than one page a time (to avoid deadlocks). Fixing such primitive can bring to complete removal of the page hold mechanism. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff Tested by: pho
* Be more aggressive in using superpages in all mappings of objects:jhb2013-07-191-2/+3
| | | | | | | | | | | | | | - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks
* The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly withkib2013-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | parallel creation of the map entries, e.g. by mmap() or stack growing. It also breaks when other entry is wired in parallel. The vm_map_wire() iterates over the map entries in the region, and assumes that map entries it finds are marked as in transition before, also that any entry marked as in transition, are marked by the current invocation of vm_map_wire(). This is not true for new entries in the holes. Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the entries which transition owner is the current thread. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* - Get rid of unused function vmspace_wired_count().zont2013-01-141-1/+0
| | | | | | Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* Fix a bug with memguard(9) on 32-bit architectures without amdf2012-07-151-2/+2
| | | | | | | | | | | | VM_KMEM_MAX_SIZE. The code was not taking into account the size of the kernel_map, which the kmem_map is allocated from, so it could produce a sub-map size too large to fit. The simplest solution is to ignore VM_KMEM_MAX entirely and base the memguard map's size off the kernel_map's size, since this is always relevant and always smaller. Found by: Justin Hibbits
* Give vm_fault()'s sequential access optimization a makeover.alc2012-05-101-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks
* Account the writeable shared mappings backed by file in the vnodekib2012-02-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
* Close a race due to dropping of the map lock between creating map entrykib2012-02-111-1/+1
| | | | | | | | | for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week
* - Merge changes to the base system to support OFED. These includejeff2011-03-211-0/+2
| | | | | a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
* Change the return type of vmspace_swap_count to a long to match the otherbrucec2011-03-011-1/+1
| | | | | | vmspace_*_count functions. MFC after: 3 days
* Calculate and return the count in vmspace_swap_count as a vm_offset_tbrucec2011-02-231-1/+1
| | | | | | | | | | instead of an int to avoid overflow. While here, clean up some style(9) issues. PR: kern/152200 Reviewed by: kib MFC after: 2 weeks
* Introduce vm_fault_hold() and use it to (1) eliminate a long-standing racealc2010-12-201-0/+1
| | | | | | | | | | condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks
* Fix a long standing (from the original 4.4BSD lite sources) race betweenmlaier2010-12-091-0/+5
| | | | | | | | | | | vmspace_fork and vm_map_wire that would lead to "vm_fault_copy_wired: page missing" panics. While faulting in pages for a map entry that is being wired down, mark the containing map as busy. In vmspace_fork wait until the map is unbusy, before we try to copy the entries. Reviewed by: kib MFC after: 5 days Sponsored by: Isilon Systems, Inc.
* Replace pointer to "struct uidinfo" with pointer to "struct ucred"trasz2010-12-021-1/+1
| | | | | | | | | in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation
* - Make 'vm_refcnt' volatile so that compilers won't be tempted to treatjhb2010-10-211-1/+1
| | | | | | | | | | its value as a loop invariant. Currently this is a no-op because 'atomic_cmpset_int()' clobbers all memory on current architectures. - Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop a reference in vmspace_free(). Reviewed by: alc MFC after: 1 month
* Make refinements to r212824. In particular, don't makealc2010-09-191-4/+3
| | | | | | | | | | | | | | | vm_map_unlock_nodefer() part of the synchronization interface for maps. Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing how they should be used. In particular, describe the deferred deallocations issue with vm_map_unlock_and_wait(). Redo the implementation of vm_map_unlock_and_wait() so that it passes along the caller's file and line information, just like the other map locking primitives. Reviewed by: kib X-MFC after: r212824
* Adopt the deferring of object deallocation for the deleted map entrieskib2010-09-181-1/+3
| | | | | | | | | | | | | | | | | | | on map unlock to the lock downgrade and later read unlock operation. System map entries cannot be backed by OBJT_VNODE objects, no need to defer deallocation for them. Map entries from user maps do not require the owner map for deallocation, and can be accumulated in the thread-local list for freeing when a user map is unlocked. Move the collection of entries for deferred reclamation into vm_map_delete(). Create helper vm_map_process_deferred(), that is called from locations where processing is feasible. Do not process deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is held. Reviewed by: alc, rstone (previous versions) Tested by: pho MFC after: 2 weeks
* o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches thejmallett2010-04-181-0/+3
| | | | | | | | | | | | | | | | | | | | | address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address. Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.
* Make _vm_map_init() the one place where the vm map's pmap field isalc2010-04-031-1/+1
| | | | | | initialized. Reviewed by: kib
* Simplify the invocation of vm_fault(). Specifically, eliminate the flagalc2009-11-271-1/+0
| | | | | | | VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
* Simplify both the invocation and the implementation of vm_fault() for wiringalc2009-11-181-2/+0
| | | | | | | | | | pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* When vm_map_wire(9) is allowed to skip holes in the wired region, skipkib2009-04-101-0/+2
| | | | | | | | | | | | | | | the mappings without any of read and execution rights, in particular, the PROT_NONE entries. This makes mlockall(2) work for the process address space that has such mappings. Since protection mode of the entry may change between setting MAP_ENTRY_IN_TRANSITION and final pass over the region that records the wire status of the entries, allocate new map entry flag MAP_ENTRY_WIRE_SKIPPED to mark the skipped PROT_NONE entries. Reported and tested by: Hans Ottevanger <fbsdhackers beasties demon nl> Reviewed by: alc MFC after: 3 weeks
* Revert the addition of the freelist argument for the vm_map_delete()kib2009-02-241-3/+2
| | | | | | | | | function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked. Tested by: pho Reviewed by: alc
* Do not call vm_object_deallocate() from vm_map_delete(), because wekib2009-02-081-1/+3
| | | | | | | | | | hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist(). Reviewed by: tegge, alc Tested by: pho
* Resurrect shared map locks allowing greater concurrency during some mapalc2009-01-011-0/+1
| | | | | | | | | | operations, such as page faults. An earlier version of this change was ... Reviewed by: kib Tested by: pho MFC after: 6 weeks
* Update or eliminate some stale comments.alc2008-12-311-9/+1
|
* Generalize vm_map_find(9)'s parameter "find_space". Specifically, addalc2008-05-101-1/+9
| | | | | | | | | | support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers. While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.
* vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can bealc2008-04-281-1/+2
| | | | passed by value.
* Make the vm_pmap field of struct vmspace the last field in themarcel2008-03-011-1/+6
| | | | | | | structure. This allows per-CPU variations of struct pmap on a single architecture without affecting the machine-independent fields. As such, the PMAP variations don't affect the ABI. They become part of it.
* Change unused 'user_wait' argument to 'timo' argument, which will bepjd2007-11-071-1/+1
| | | | | | | used to specify timeout for msleep(9). Discussed with: alc Reviewed by: alc
* Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().kib2007-08-201-0/+1
| | | | | | | | | | | For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Close race between vmspace_exitfree() and exit1() and races betweentegge2006-05-291-1/+0
| | | | | | | | | | | | | | | | | vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc
* Eliminate unneeded preallocation at initialization.alc2005-12-031-1/+0
| | | | Reviewed by: tegge
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Replace the linear search in vm_map_findspace() with an O(log n)alc2004-08-131-1/+2
| | | | | | | | | | | | | | | | | | | | | algorithm built into the map entry splay tree. This replaces the first_free hint in struct vm_map with two fields in vm_map_entry: adj_free, the amount of free space following a map entry, and max_free, the maximum amount of free space in the entry's subtree. These fields make it possible to find a first-fit free region of a given size in one pass down the tree, so O(log n) amortized using splay trees. This significantly reduces the overhead in vm_map_findspace() for applications that mmap() many hundreds or thousands of regions, and has a negligible slowdown (0.1%) on buildworld. See, for example, the discussion of a micro-benchmark titled "Some mmap observations compared to Linux 2.6/OpenBSD" on -hackers in late October 2003. OpenBSD adopted this approach in March 2002, and NetBSD added it in November 2003, both with Red-Black trees. Submitted by: Mark W. Krentel
* The vm map lock is needed in vm_fault() after the page has been found,tegge2004-08-121-1/+2
| | | | | | | | | to avoid later changes before pmap_enter() and vm_fault_prefault() has completed. Simplify deadlock avoidance by not blocking on vm map relookup. In collaboration with: alc
* Revamp VM map wiring.green2004-08-091-0/+14
| | | | | | | | | | | | | | | | | * Allow no-fault wiring/unwiring to succeed for consistency; however, the wired count remains at zero, so it's a special case. * Fix issues inside vm_map_wire() and vm_map_unwire() where the exact state of user wiring (one or zero) and system wiring (zero or more) could be confused; for example, system unwiring could succeed in removing a user wire, instead of being an error. * Require all mappings to be unwired before they are deleted. When VM space is still wired upon deletion, it will be waited upon for the following unwire. This makes vslock(9) work rather than allowing kernel-locked memory to be deleted out from underneath of its consumer as it would before.
* Get rid of another lockmgr(9) consumer by using sx locks for the usermux2004-07-301-2/+2
| | | | | | | | | maps. We always acquire the sx lock exclusively here, but we can't use a mutex because we want to be able to sleep while holding the lock. This is completely equivalent to what we were doing with the lockmgr(9) locks before. Approved by: alc
* Simplify vmspace initialization. The bcopy() of fields from the oldalc2004-07-241-3/+0
| | | | | | | vmspace to the new vmspace in vmspace_exec() is mostly wasted effort. With one exception, vm_swrss, the copied fields are immediately overwritten. Instead, initialize these fields to zero in vmspace_alloc(), eliminating a bcopy() from vmspace_exec() and a bzero() from vmspace_fork().
* Micro-optimize vmspace for 64-bit architectures: Colocate vm_refcnt andalc2004-07-061-1/+1
| | | | vm_exitingcnt so that alignment does not result in wasted space.
* Remove an unused field from the vmspace structure.alc2004-06-261-2/+1
|
* In cases where a file was resident in memory mmap(..., PROT_NONE, ...)alc2004-04-241-1/+1
| | | | | | | | | | | | | | | would actually map the file with read access enabled. According to http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html this is an error. Similarly, an madvise(..., MADV_WILLNEED) would enable read access on a virtual address range that was PROT_NONE. The solution implemented herein is (1) to pass a vm_prot_t to vm_map_pmap_enter() describing the allowed access and (2) to make vm_map_pmap_enter() responsible for understanding the limitations of pmap_enter_quick(). Submitted by: "Mark W. Krentel" <krentel@dreamscape.com> PR: kern/64573
* Remove advertising clause from University of California Regent's license,imp2004-04-061-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
OpenPOWER on IntegriCloud