summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_map.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r267213 (by alc):kib2014-07-241-13/+26
| | | | | | Add a page size field to struct vm_page. Approved by: alc
* MFC r267664:kib2014-06-271-0/+9
| | | | | | Assert that the new entry is inserted into the right location in the map entries list, and that it does not overlap with the previous and next entries.
* MFC r267630:kib2014-06-261-1/+2
| | | | Add MAP_EXCL flag for mmap(2).
* MFC r267766:kib2014-06-261-1/+1
| | | | Use correct names for the flags.
* MFC r267254:kib2014-06-231-34/+59
| | | | | | | Make mmap(MAP_STACK) search for the available address space. MFC r267497 (by alc): Use local variable instead of sgrowsiz.
* MFC r266780:kib2014-06-041-4/+1
| | | | Remove the assert which can be triggered by the userspace.
* MFC r265886, r265948alc2014-05-231-7/+14
| | | | | | | | | | | | | | | | | | With the new-and-improved vm_fault_copy_entry() (r265843), we can always avoid soft page faults when adding write access to user wired entries in vm_map_protect(). Previously, we only avoided the soft page fault when the underlying pages were copy-on-write. In other words, we avoided the pages faults that might sleep on page allocation, but not the trivial page faults to update the physical map. On a fork allow read-only wired pages to be copy-on-write shared between the parent and child processes. Previously, we copied these pages even though they are read only. However, the reason for copying them is historical and no longer exists. In recent times, vm_map_protect() has developed the ability to copy pages when write access is added to wired copy-on-write pages. So, in this case, copy-on-write sharing of wired pages is not to be feared. It is not going to lead to copy-on-write faults on wired memory.
* MFC r266464:kib2014-05-231-1/+3
| | | | | In execve(2), postpone the free of old vmspace until the threads are resumed and exited.
* MFC r265850alc2014-05-171-1/+2
| | | | | | About 9% of the pmap_protect() calls being performed by vm_map_copy_entry() are unnecessary. Eliminate the unnecessary calls.
* MFC r265825:kib2014-05-171-1/+1
| | | | | When printing the map with the ddb 'show procvm' command, do not dump page queues for the backing objects.
* MFC r265824:kib2014-05-171-1/+2
| | | | Print the entry address in addition to the object.
* MFC r263471:kib2014-03-241-0/+1
| | | | Initialize vm_map_entry member wiring_thread on the map entry creation.
* MFC r259951:kib2013-12-301-1/+1
| | | | | Do not coalesce stack entry. Pass MAP_STACK_GROWS_DOWN and MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack()
* MFC r258367:kib2013-12-131-0/+15
| | | | | Verify for zero-length requests and act as if it is always successfull without performing any action on the address space.
* MFC r258366:kib2013-12-131-1/+9
| | | | | Add assertions to cover all places in the wiring and unwiring code where MAP_ENTRY_IN_TRANSITION is set or cleared.
* Both the vm_map and vmspace zones are defined as "no free". So, there is noalc2013-09-221-23/+2
| | | | | | | | point in defining a fini function for these zones. Reviewed by: kib Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
* Merge the following changes from projects/bhyve_npt_pmap:neel2013-09-201-5/+12
| | | | | | | | | | | | - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-3/+4
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-291-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Revert r254501. Instead, reuse the type stability of the struct pmapkib2013-08-221-0/+1
| | | | | | | | | | | which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
* Add new mmap(2) flags to permit applications to request specific virtualjhb2013-08-161-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-24/+7
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Clear entire map structure including locks so that thekientzle2013-07-251-2/+1
| | | | | | | | | | | | locks don't accidentally appear to have been already initialized. In particular, this fixes a consistent kernel crash on armv6 with: panic: lock "vm map (user)" 0xc09cc050 already initialized that appeared with r251709. PR: arm/180820
* Be more aggressive in using superpages in all mappings of objects:jhb2013-07-191-5/+15
| | | | | | | | | | | | | | - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks
* The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly withkib2013-07-111-11/+51
| | | | | | | | | | | | | | | | | | | | parallel creation of the map entries, e.g. by mmap() or stack growing. It also breaks when other entry is wired in parallel. The vm_map_wire() iterates over the map entries in the region, and assumes that map entries it finds are marked as in transition before, also that any entry marked as in transition, are marked by the current invocation of vm_map_wire(). This is not true for new entries in the holes. Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the entries which transition owner is the current thread. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Fix a bug that allowed a tracing process (e.g. gdb) to writedes2013-06-181-0/+6
| | | | | | | | | | to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. Security: CVE-2013-2171 Security: FreeBSD-SA-13:06.mmap Approved by: so
* o Relax locking assertions for vm_page_find_least()attilio2013-05-211-7/+15
| | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Fix the assertions for the state of the object under the map entrykib2013-04-091-6/+16
| | | | | | | | | | | with the MAP_ENTRY_VN_WRITECNT flag: - Move the assertion that verifies the state of the v_writecount and vnp.writecount, under the block where the object is locked. - Check that the object type is OBJT_VNODE before asserting. Reported by: avg Reviewed by: alc MFC after: 1 week
* MFCattilio2013-02-261-2/+1
|
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-22/+22
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* - Get rid of unused function vmspace_wired_count().zont2013-01-141-6/+0
| | | | | | Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* - Reduce kernel size by removing unnecessary pointer indirections.zont2013-01-101-6/+4
| | | | | | | | | GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes. Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* - Fix locked memory accounting for maps with MAP_WIREFUTURE flag.zont2012-12-181-6/+39
| | | | | | | | - Add sysctl vm.old_mlock which may turn such accounting off. Reviewed by: avg, trasz Approved by: kib (mentor) MFC after: 1 week
* In the past four years, we've added two new vm object types. Each time,alc2012-12-091-8/+7
| | | | | | | | | | | | | | | | | | similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib
* Make a few small changes to vm_map_pmap_enter():alc2012-11-251-9/+10
| | | | | | | | | | | | | | | | | | | | Add detail to the comment describing this function. In particular, describe what MAP_PREFAULT_PARTIAL does. Eliminate the abrupt change in behavior when the specified address range grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages. Instead of doing nothing, i.e., preloading no mappings whatsoever, map any resident pages that fall within the start of the specified address range, i.e., [addr, addr + ulmin(size, ptoa(MAX_INIT_PT))). Long ago, the vm object's list of resident pages was not ordered, so this function had to choose between probing the global hash table of all resident pages and iterating over the vm object's unordered list of resident pages. Now, the list is ordered, so there is no reason for MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of resident changes. MFC after: 14 days
* Fix DDB command "show map XXX":attilio2012-11-121-24/+14
| | | | | | | | | | | | | - Check that an argument is always available, otherwise current map printing before to recurse is garbage. - Spit out a message if an argument is not provided. - Remove unread nlines variable. - Use an explicit recursive function, disassociated from the DB_SHOW_COMMAND() body, in order to make clear prototype and recursion of the above mentioned function. The code results now much less obscure. Submitted by: gianni
* - After r240026 sgrowsiz should be used in a safer maner.zont2012-09-031-4/+7
| | | | | Approved by: kib (mentor) MCF after: 1 week
* Add new pmap layer locks to the predefined lock order. Change the namesalc2012-06-271-2/+2
| | | | of a few existing VM locks to follow a consistent naming scheme.
* Move the per-thread deferred user map entries list into a private listjhb2012-06-201-3/+6
| | | | | | | | | | in vm_map_process_deferred() which is then iterated to release map entries. This avoids having a nested vm map unlock operation called from the loop body attempt to recuse into vm_map_process_deferred(). This can happen if the vm_map_remove() triggers the OOM killer. Reviewed by: alc, kib MFC after: 1 week
* Use the previous stack entry protection and max protection to correctlykib2012-06-101-1/+1
| | | | | | | | | | | | propagate the stack execution permissions when stack is grown down. First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack protection for current ABI, so the new stack entry was typically marked executable always. Second, for non-main stack MAP_STACK mapping, the PROT_ flags should be used which were specified at the mmap(2) call time, and not sv_stackprot. MFC after: 1 week
* Give vm_fault()'s sequential access optimization a makeover.alc2012-05-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks
* Fix madvise(MADV_WILLNEED) to properly handle individual mappings largerjhb2012-03-191-12/+10
| | | | | | | | than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.
* In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flagkib2012-03-171-2/+5
| | | | | | | | | | | | | | | | | | if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks
* Simplify vmspace_fork()'s control flow by copying immutable data beforealc2012-02-251-14/+10
| | | | | | | | the vm map locks are acquired. Also, eliminate redundant initialization of the new vm map's timestamp. Reviewed by: kib MFC after: 3 weeks
* Account the writeable shared mappings backed by file in the vnodekib2012-02-231-4/+73
| | | | | | | | | | | | | | | | | | | | | | | | v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
* Close a race due to dropping of the map lock between creating map entrykib2012-02-111-2/+7
| | | | | | | | | for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week
* Introduce the same mutex-wise fix in r227758 for sx locks.attilio2011-11-211-29/+13
| | | | | | | | | | | | | | | | | | | | | The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month
* Introduce macro stubs in the mutex implementation that will be alwaysattilio2011-11-201-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month
* All the racct_*() calls need to happen with the proc locked. Fixing thistrasz2011-07-061-0/+10
| | | | | | won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
OpenPOWER on IntegriCloud