summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_map.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r287591:kib2015-09-161-10/+4
| | | | | There is no reason in the current kernel to disallow write access to the COW wired entry if the entry permissions allow it. Remove the check.
* MFC r286086:kib2015-08-061-1/+1
| | | | Do not pretend that vm_fault(9) supports unwiring the address.
* MFC r285046:kib2015-08-051-1/+2
| | | | | Account for the main process stack being one page below the highest user address when ABI uses shared page.
* MFC r282213:trasz2015-06-211-23/+32
| | | | | | | | | | | | | | | | | | Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). MFC r282901: Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Note those two are MFC-ed together, because the latter one changes the name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have committed the renaming separately... Relnotes: yes Sponsored by: The FreeBSD Foundation
* MFC r277649:rstone2015-03-011-0/+3
| | | | | | | | | vmspace_release() may sleep if the last reference is being released, so add a WITNESS_WARN() to catch cases where it is called with a non-sleepable lock held. MFC after: 1 month Sponsored by: Sandvine Inc.
* MFC r272036:kib2014-09-271-1/+8
| | | | | | | Avoid calling vm_map_pmap_enter() for the MADV_WILLNEED on the wired entry, the pages must be already mapped. Approved by: re (gjb)
* Fix a leak of the wired pages when unwiring of the PROT_NONE-mappedkib2014-09-011-44/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level. All commits below are by alc. MFC r268327: Introduce pmap_unwire(). MFC r268591: Implement pmap_unwire() for powerpc. MFC r268776: Implement pmap_unwire() for arm. MFC r268806: pmap_unwire(9) man page. MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed. MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic. MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.) MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used. MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized. MFC r269485: Retire pmap_change_wiring(). Reviewed by: alc
* MFC r267213 (by alc):kib2014-07-241-13/+26
| | | | | | Add a page size field to struct vm_page. Approved by: alc
* MFC r267664:kib2014-06-271-0/+9
| | | | | | Assert that the new entry is inserted into the right location in the map entries list, and that it does not overlap with the previous and next entries.
* MFC r267630:kib2014-06-261-1/+2
| | | | Add MAP_EXCL flag for mmap(2).
* MFC r267766:kib2014-06-261-1/+1
| | | | Use correct names for the flags.
* MFC r267254:kib2014-06-231-34/+59
| | | | | | | Make mmap(MAP_STACK) search for the available address space. MFC r267497 (by alc): Use local variable instead of sgrowsiz.
* MFC r266780:kib2014-06-041-4/+1
| | | | Remove the assert which can be triggered by the userspace.
* MFC r265886, r265948alc2014-05-231-7/+14
| | | | | | | | | | | | | | | | | | With the new-and-improved vm_fault_copy_entry() (r265843), we can always avoid soft page faults when adding write access to user wired entries in vm_map_protect(). Previously, we only avoided the soft page fault when the underlying pages were copy-on-write. In other words, we avoided the pages faults that might sleep on page allocation, but not the trivial page faults to update the physical map. On a fork allow read-only wired pages to be copy-on-write shared between the parent and child processes. Previously, we copied these pages even though they are read only. However, the reason for copying them is historical and no longer exists. In recent times, vm_map_protect() has developed the ability to copy pages when write access is added to wired copy-on-write pages. So, in this case, copy-on-write sharing of wired pages is not to be feared. It is not going to lead to copy-on-write faults on wired memory.
* MFC r266464:kib2014-05-231-1/+3
| | | | | In execve(2), postpone the free of old vmspace until the threads are resumed and exited.
* MFC r265850alc2014-05-171-1/+2
| | | | | | About 9% of the pmap_protect() calls being performed by vm_map_copy_entry() are unnecessary. Eliminate the unnecessary calls.
* MFC r265825:kib2014-05-171-1/+1
| | | | | When printing the map with the ddb 'show procvm' command, do not dump page queues for the backing objects.
* MFC r265824:kib2014-05-171-1/+2
| | | | Print the entry address in addition to the object.
* MFC r263471:kib2014-03-241-0/+1
| | | | Initialize vm_map_entry member wiring_thread on the map entry creation.
* MFC r259951:kib2013-12-301-1/+1
| | | | | Do not coalesce stack entry. Pass MAP_STACK_GROWS_DOWN and MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack()
* MFC r258367:kib2013-12-131-0/+15
| | | | | Verify for zero-length requests and act as if it is always successfull without performing any action on the address space.
* MFC r258366:kib2013-12-131-1/+9
| | | | | Add assertions to cover all places in the wiring and unwiring code where MAP_ENTRY_IN_TRANSITION is set or cleared.
* Both the vm_map and vmspace zones are defined as "no free". So, there is noalc2013-09-221-23/+2
| | | | | | | | point in defining a fini function for these zones. Reviewed by: kib Approved by: re (glebius) Sponsored by: EMC / Isilon Storage Division
* Merge the following changes from projects/bhyve_npt_pmap:neel2013-09-201-5/+12
| | | | | | | | | | | | - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-3/+4
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-291-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Revert r254501. Instead, reuse the type stability of the struct pmapkib2013-08-221-0/+1
| | | | | | | | | | | which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
* Add new mmap(2) flags to permit applications to request specific virtualjhb2013-08-161-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-24/+7
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Clear entire map structure including locks so that thekientzle2013-07-251-2/+1
| | | | | | | | | | | | locks don't accidentally appear to have been already initialized. In particular, this fixes a consistent kernel crash on armv6 with: panic: lock "vm map (user)" 0xc09cc050 already initialized that appeared with r251709. PR: arm/180820
* Be more aggressive in using superpages in all mappings of objects:jhb2013-07-191-5/+15
| | | | | | | | | | | | | | - Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for vm_map_find() that will try to alter the alignment of a mapping to match any existing superpage mappings of the object being mapped. If no suitable address range is found with the necessary alignment, vm_map_find() will fall back to using the simple first-fit strategy (VMFS_ANY_SPACE). - Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE. Reviewed by: alc (earlier version) MFC after: 2 weeks
* The mlockall() or VM_MAP_WIRE_HOLESOK does not interact properly withkib2013-07-111-11/+51
| | | | | | | | | | | | | | | | | | | | parallel creation of the map entries, e.g. by mmap() or stack growing. It also breaks when other entry is wired in parallel. The vm_map_wire() iterates over the map entries in the region, and assumes that map entries it finds are marked as in transition before, also that any entry marked as in transition, are marked by the current invocation of vm_map_wire(). This is not true for new entries in the holes. Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the entries which transition owner is the current thread. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Fix a bug that allowed a tracing process (e.g. gdb) to writedes2013-06-181-0/+6
| | | | | | | | | | to a memory-mapped file in the traced process's address space even if neither the traced process nor the tracing process had write access to that file. Security: CVE-2013-2171 Security: FreeBSD-SA-13:06.mmap Approved by: so
* o Relax locking assertions for vm_page_find_least()attilio2013-05-211-7/+15
| | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Fix the assertions for the state of the object under the map entrykib2013-04-091-6/+16
| | | | | | | | | | | with the MAP_ENTRY_VN_WRITECNT flag: - Move the assertion that verifies the state of the v_writecount and vnp.writecount, under the block where the object is locked. - Check that the object type is OBJT_VNODE before asserting. Reported by: avg Reviewed by: alc MFC after: 1 week
* MFCattilio2013-02-261-2/+1
|
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-22/+22
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* - Get rid of unused function vmspace_wired_count().zont2013-01-141-6/+0
| | | | | | Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* - Reduce kernel size by removing unnecessary pointer indirections.zont2013-01-101-6/+4
| | | | | | | | | GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes. Suggested by: alc Reviewed by: alc Approved by: kib (mentor) MFC after: 1 week
* - Fix locked memory accounting for maps with MAP_WIREFUTURE flag.zont2012-12-181-6/+39
| | | | | | | | - Add sysctl vm.old_mlock which may turn such accounting off. Reviewed by: avg, trasz Approved by: kib (mentor) MFC after: 1 week
* In the past four years, we've added two new vm object types. Each time,alc2012-12-091-8/+7
| | | | | | | | | | | | | | | | | | similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib
* Make a few small changes to vm_map_pmap_enter():alc2012-11-251-9/+10
| | | | | | | | | | | | | | | | | | | | Add detail to the comment describing this function. In particular, describe what MAP_PREFAULT_PARTIAL does. Eliminate the abrupt change in behavior when the specified address range grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages. Instead of doing nothing, i.e., preloading no mappings whatsoever, map any resident pages that fall within the start of the specified address range, i.e., [addr, addr + ulmin(size, ptoa(MAX_INIT_PT))). Long ago, the vm object's list of resident pages was not ordered, so this function had to choose between probing the global hash table of all resident pages and iterating over the vm object's unordered list of resident pages. Now, the list is ordered, so there is no reason for MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of resident changes. MFC after: 14 days
* Fix DDB command "show map XXX":attilio2012-11-121-24/+14
| | | | | | | | | | | | | - Check that an argument is always available, otherwise current map printing before to recurse is garbage. - Spit out a message if an argument is not provided. - Remove unread nlines variable. - Use an explicit recursive function, disassociated from the DB_SHOW_COMMAND() body, in order to make clear prototype and recursion of the above mentioned function. The code results now much less obscure. Submitted by: gianni
* - After r240026 sgrowsiz should be used in a safer maner.zont2012-09-031-4/+7
| | | | | Approved by: kib (mentor) MCF after: 1 week
* Add new pmap layer locks to the predefined lock order. Change the namesalc2012-06-271-2/+2
| | | | of a few existing VM locks to follow a consistent naming scheme.
* Move the per-thread deferred user map entries list into a private listjhb2012-06-201-3/+6
| | | | | | | | | | in vm_map_process_deferred() which is then iterated to release map entries. This avoids having a nested vm map unlock operation called from the loop body attempt to recuse into vm_map_process_deferred(). This can happen if the vm_map_remove() triggers the OOM killer. Reviewed by: alc, kib MFC after: 1 week
* Use the previous stack entry protection and max protection to correctlykib2012-06-101-1/+1
| | | | | | | | | | | | propagate the stack execution permissions when stack is grown down. First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack protection for current ABI, so the new stack entry was typically marked executable always. Second, for non-main stack MAP_STACK mapping, the PROT_ flags should be used which were specified at the mmap(2) call time, and not sv_stackprot. MFC after: 1 week
* Give vm_fault()'s sequential access optimization a makeover.alc2012-05-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two aspects to the sequential access optimization: (1) read ahead of pages that are expected to be accessed in the near future and (2) unmap and cache behind of pages that are not expected to be accessed again. This revision changes both aspects. The read ahead optimization is now more effective. It starts with the same initial read window as before, but arithmetically grows the window on sequential page faults. This can yield increased read bandwidth. For example, on one of my machines, a program using mmap() to read a file that is several times larger than the machine's physical memory takes about 17% less time to complete. The unmap and cache behind optimization is now more selectively applied. The read ahead window must grow to its maximum size before unmap and cache behind is performed. This significantly reduces the number of times that pages are unmapped and cached only to be reactivated a short time later. The unmap and cache behind optimization now clears each page's referenced flag. Previously, in the case of dirty pages, if the containing file was still mapped at the time that the page daemon examined the dirty pages, they would be reactivated. From a stylistic standpoint, this revision also cleanly separates the implementation of the read ahead and unmap/cache behind optimizations. Glanced at: kib MFC after: 2 weeks
* Fix madvise(MADV_WILLNEED) to properly handle individual mappings largerjhb2012-03-191-12/+10
| | | | | | | | than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.
OpenPOWER on IntegriCloud