| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Add a page size field to struct vm_page.
Approved by: alc
|
|
|
|
|
|
| |
Assert that the new entry is inserted into the right location in the
map entries list, and that it does not overlap with the previous and
next entries.
|
|
|
|
| |
Add MAP_EXCL flag for mmap(2).
|
|
|
|
| |
Use correct names for the flags.
|
|
|
|
|
|
|
| |
Make mmap(MAP_STACK) search for the available address space.
MFC r267497 (by alc):
Use local variable instead of sgrowsiz.
|
|
|
|
| |
Remove the assert which can be triggered by the userspace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the new-and-improved vm_fault_copy_entry() (r265843), we can always
avoid soft page faults when adding write access to user wired entries in
vm_map_protect(). Previously, we only avoided the soft page fault when
the underlying pages were copy-on-write. In other words, we avoided the
pages faults that might sleep on page allocation, but not the trivial
page faults to update the physical map.
On a fork allow read-only wired pages to be copy-on-write shared between
the parent and child processes. Previously, we copied these pages even
though they are read only. However, the reason for copying them is
historical and no longer exists. In recent times, vm_map_protect() has
developed the ability to copy pages when write access is added to wired
copy-on-write pages. So, in this case, copy-on-write sharing of wired
pages is not to be feared. It is not going to lead to copy-on-write
faults on wired memory.
|
|
|
|
|
| |
In execve(2), postpone the free of old vmspace until the threads are resumed
and exited.
|
|
|
|
|
|
| |
About 9% of the pmap_protect() calls being performed by
vm_map_copy_entry() are unnecessary.
Eliminate the unnecessary calls.
|
|
|
|
|
| |
When printing the map with the ddb 'show procvm' command, do not dump
page queues for the backing objects.
|
|
|
|
| |
Print the entry address in addition to the object.
|
|
|
|
| |
Initialize vm_map_entry member wiring_thread on the map entry creation.
|
|
|
|
|
| |
Do not coalesce stack entry. Pass MAP_STACK_GROWS_DOWN and
MAP_STACK_GROWS_UP flags to vm_map_insert() from vm_map_stack()
|
|
|
|
|
| |
Verify for zero-length requests and act as if it is always successfull
without performing any action on the address space.
|
|
|
|
|
| |
Add assertions to cover all places in the wiring and unwiring code
where MAP_ENTRY_IN_TRANSITION is set or cleared.
|
|
|
|
|
|
|
|
| |
point in defining a fini function for these zones.
Reviewed by: kib
Approved by: re (glebius)
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add fields to 'struct pmap' that are required to manage nested page tables.
- add a parameter to 'vmspace_alloc()' that can be used to override the
default pmap initialization routine 'pmap_pinit()'.
These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap'
in anticipation of the upcoming KBI freeze for 10.0.
Reviewed by: kib@, alc@
Approved by: re (glebius)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.
To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.
Reviewed by: alc
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).
Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.
Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().
Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
| |
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().
Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
address alignment of mappings.
- MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n).
Requests for n >= number of bits in a pointer or less than the size of
a page fail with EINVAL. This matches the API provided by NetBSD.
- MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used
to optimize the chances of using large pages. By default it will align
the mapping on a large page boundary (the system is free to choose any
large page size to align to that seems best for the mapping request).
However, if the object being mapped is already using large pages, then
it will align the virtual mapping to match the existing large pages in
the object instead.
- Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and
VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment.
MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while
MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE.
- mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than
explicitly using VMFS_SUPER_SPACE. All device objects are forced to
use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively
equivalent.
Reviewed by: alc
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
|
| |
locks don't accidentally appear to have been already
initialized.
In particular, this fixes a consistent kernel crash on
armv6 with:
panic: lock "vm map (user)" 0xc09cc050 already initialized
that appeared with r251709.
PR: arm/180820
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add a new address space allocation method (VMFS_OPTIMAL_SPACE) for
vm_map_find() that will try to alter the alignment of a mapping to match
any existing superpage mappings of the object being mapped. If no
suitable address range is found with the necessary alignment,
vm_map_find() will fall back to using the simple first-fit strategy
(VMFS_ANY_SPACE).
- Change mmap() without MAP_FIXED, shmat(), and the GEM mapping ioctl to
use VMFS_OPTIMAL_SPACE instead of VMFS_ANY_SPACE.
Reviewed by: alc (earlier version)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parallel creation of the map entries, e.g. by mmap() or stack growing.
It also breaks when other entry is wired in parallel.
The vm_map_wire() iterates over the map entries in the region, and
assumes that map entries it finds are marked as in transition before,
also that any entry marked as in transition, are marked by the current
invocation of vm_map_wire(). This is not true for new entries in the
holes.
Add the thread owner of the MAP_ENTRY_IN_TRANSITION flag to struct
vm_map_entry. In vm_map_wire() and vm_map_unwire(), only process the
entries which transition owner is the current thread.
Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
to a memory-mapped file in the traced process's address space
even if neither the traced process nor the tracing process had
write access to that file.
Security: CVE-2013-2171
Security: FreeBSD-SA-13:06.mmap
Approved by: so
|
|
|
|
|
|
|
|
|
|
|
|
| |
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.
Sponsored by: EMC / Isilon storage division
Reviewed by: alc
|
|
|
|
|
|
|
|
|
|
|
| |
with the MAP_ENTRY_VN_WRITECNT flag:
- Move the assertion that verifies the state of the v_writecount and
vnp.writecount, under the block where the object is locked.
- Check that the object type is OBJT_VNODE before asserting.
Reported by: avg
Reviewed by: alc
MFC after: 1 week
|
| |
|
|
|
|
|
|
| |
their "write" versions.
Sponsored by: EMC / Isilon storage division
|
|
|
|
|
|
|
|
| |
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
files require including directly rwlock.h
|
|
|
|
|
|
| |
Reviewed by: alc
Approved by: kib (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
GENERIC kernel size reduced in 16 bytes and RACCT kernel in 336 bytes.
Suggested by: alc
Reviewed by: alc
Approved by: kib (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
- Add sysctl vm.old_mlock which may turn such accounting off.
Reviewed by: avg, trasz
Approved by: kib (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
similar changes had to be made in various places throughout the machine-
independent virtual memory layer to support the new vm object type.
However, in most of these places, it's actually not the type of the vm
object that matters to us but instead certain attributes of its pages.
For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain
fictitious pages. In other words, in most of these places, we were
testing the vm object's type to determine if it contained fictitious (or
unmanaged) pages.
To both simplify the code in these places and make the addition of future
vm object types easier, this change introduces two new vm object flags
that describe attributes of the vm object's pages, specifically, whether
they are fictitious or unmanaged.
Reviewed and tested by: kib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add detail to the comment describing this function. In particular,
describe what MAP_PREFAULT_PARTIAL does.
Eliminate the abrupt change in behavior when the specified address range
grows from MAX_INIT_PT pages to MAX_INIT_PT plus one pages. Instead of
doing nothing, i.e., preloading no mappings whatsoever, map any resident
pages that fall within the start of the specified address range, i.e.,
[addr, addr + ulmin(size, ptoa(MAX_INIT_PT))).
Long ago, the vm object's list of resident pages was not ordered, so
this function had to choose between probing the global hash table of
all resident pages and iterating over the vm object's unordered list of
resident pages. Now, the list is ordered, so there is no reason for
MAP_PREFAULT_PARTIAL to be concerned with the vm object's count of
resident changes.
MFC after: 14 days
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Check that an argument is always available, otherwise current map
printing before to recurse is garbage.
- Spit out a message if an argument is not provided.
- Remove unread nlines variable.
- Use an explicit recursive function, disassociated from the
DB_SHOW_COMMAND() body, in order to make clear prototype and recursion
of the above mentioned function. The code results now much less
obscure.
Submitted by: gianni
|
|
|
|
|
| |
Approved by: kib (mentor)
MCF after: 1 week
|
|
|
|
| |
of a few existing VM locks to follow a consistent naming scheme.
|
|
|
|
|
|
|
|
|
|
| |
in vm_map_process_deferred() which is then iterated to release map entries.
This avoids having a nested vm map unlock operation called from the loop
body attempt to recuse into vm_map_process_deferred(). This can happen if
the vm_map_remove() triggers the OOM killer.
Reviewed by: alc, kib
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
propagate the stack execution permissions when stack is grown down.
First, curproc->p_sysent->sv_stackprot specifies maximum allowed stack
protection for current ABI, so the new stack entry was typically marked
executable always. Second, for non-main stack MAP_STACK mapping,
the PROT_ flags should be used which were specified at the mmap(2) call
time, and not sv_stackprot.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two aspects to the sequential access optimization: (1) read ahead
of pages that are expected to be accessed in the near future and (2) unmap
and cache behind of pages that are not expected to be accessed again. This
revision changes both aspects.
The read ahead optimization is now more effective. It starts with the same
initial read window as before, but arithmetically grows the window on
sequential page faults. This can yield increased read bandwidth. For
example, on one of my machines, a program using mmap() to read a file that
is several times larger than the machine's physical memory takes about 17%
less time to complete.
The unmap and cache behind optimization is now more selectively applied.
The read ahead window must grow to its maximum size before unmap and cache
behind is performed. This significantly reduces the number of times that
pages are unmapped and cached only to be reactivated a short time later.
The unmap and cache behind optimization now clears each page's referenced
flag. Previously, in the case of dirty pages, if the containing file was
still mapped at the time that the page daemon examined the dirty pages,
they would be reactivated.
From a stylistic standpoint, this revision also cleanly separates the
implementation of the read ahead and unmap/cache behind optimizations.
Glanced at: kib
MFC after: 2 weeks
|
|
|
|
|
|
|
|
| |
than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int'
count of pages overflowed on 64-bit platforms. While here, change
vm_object_madvise() to accept two vm_pindex_t parameters (start and end)
rather than a (start, count) tuple to match other VM APIs as suggested
by alc@.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the filesystem performed short write and we are skipping the page
due to this.
Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.
While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.
PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks
|
|
|
|
|
|
|
|
| |
the vm map locks are acquired. Also, eliminate redundant initialization
of the new vm map's timestamp.
Reviewed by: kib
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v_writecount. Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.
Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry. During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.
Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.
Noted by: tegge
Reviewed by: tegge (long time ago, early version), alc
Tested by: pho
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
| |
for a shared mapping and marking the entry for inheritance.
Other thread might execute vmspace_fork() in between (e.g. by fork(2)),
resulting in the mapping becoming private.
Noted and reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_
Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by: kib
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.
The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_
Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
ppbus code by using the same macro as done in this patch (but this is
left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
the most notable case is sx which will allow a further cleanup of
vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
is previewed (infact it uses all the underlying mechanisms already
present).
Comments review by: eadler, Ben Kaduk
Discussed with: kib, jhb
MFC after: 1 month
|
|
|
|
|
|
| |
won't happen before 9.0. This commit adds "#ifdef RACCT" around all the
"PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order
to avoid useless locking/unlocking in kernels built without "options RACCT".
|