| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
the page. This PMAP requires an additional lock besides the PMAP lock
in pmap_extract_and_hold(), which vm_page_pa_tryrelock() did not release.
Suggested by: kib
MFC after: 4 days
|
|
|
|
|
|
|
|
|
|
|
|
| |
cover the initial stack size. For MCL_WIREFUTURE maps, the subsequent
call to vm_map_wire() to wire the whole stack region fails due to
VM_MAP_WIRE_NOHOLES flag.
Use the VM_MAP_WIRE_HOLESOK to only wire mapped part of the stack.
Reported and tested by: Sushanth Rai <sushanth_rai yahoo com>
Reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
| |
require the page queues lock.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.
Reviewed by: alc
MFC after: 2 weeks
X-MFC: r234039
|
|
|
|
| |
MFC after: 2 weeks
|
|
|
|
|
|
|
|
| |
that it will be freed to the cache pool rather than the default pool.
Otherwise, the cached pages within the reservation may be recycled sooner
than necessary.
Reported by: Andrey Zonov
|
|
|
|
| |
Reviewed by: alc
|
|
|
|
|
|
|
|
| |
caches, by invalidating kernel icaches only when needed and not flushing
user caches for shared pages.
Suggested by: kib
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
a pair of records similar to syscall entry and return that a user can
use to determine how long page faults take. The new ktrace records are
enabled via the 'p' trace type, and are enabled in the default set of
trace points.
Reviewed by: kib
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to enable the collection of counts of synchronous and asynchronous
reads and writes for its associated filesystem. The counts are
displayed using `mount -v'.
Ensure that buffers used for paging indicate the vnode from
which they are operating so that counts of paging I/O operations
from the filesystem are collected.
This checkin only adds the setting of the mount point for the
UFS/FFS filesystem, but it would be trivial to add the setting
and clearing of the mount point at filesystem mount/unmount
time for other filesystems too.
Reviewed by: kib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kernel.
When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.
In collaboration with: kib
Tested by: gibbs (an earlier version)
I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.
MFC after: 1 week
|
|
|
|
|
|
| |
this earlier.)
Requested by: alc
|
|
|
|
|
|
|
|
| |
than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int'
count of pages overflowed on 64-bit platforms. While here, change
vm_object_madvise() to accept two vm_pindex_t parameters (start and end)
rather than a (start, count) tuple to match other VM APIs as suggested
by alc@.
|
|
|
|
|
| |
vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t,
but a page index instead of a byte offset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
if the filesystem performed short write and we are skipping the page
due to this.
Propogate write error from the pager back to the callers of
vm_pageout_flush(). Report the failure to write a page from the
requested range as the FALSE return value from vm_object_page_clean(),
and propagate it back to msync(2) to return EIO to usermode.
While there, convert the clearobjflags variable in the
vm_object_page_clean() and arguments of the helper functions to
boolean.
PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks
|
| |
|
| |
|
|
|
|
| |
Submitted by: bde
|
|
|
|
|
|
|
|
|
| |
external pagers in Mach. FreeBSD doesn't implement external pagers.
Moreover, it don't pageout the kernel object. So, the reasons for
having code don't hold.
Reviewed by: kib
MFC after: 6 weeks
|
|
|
|
|
|
|
|
| |
Add a comment describing what vm_mmap_to_errno() does.
Reviewed by: kib
MFC after: 3 weeks
X-MFC after: r232071
|
|
|
|
|
|
|
|
| |
the vm map locks are acquired. Also, eliminate redundant initialization
of the new vm map's timestamp.
Reviewed by: kib
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
| |
accounting for shared writeable mappings for all filesystems, not only
for the bypass layers.
Submitted by: alc
Pointy hat to: kib
MFC after: 20 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v_writecount. Keep the amount of the virtual address space used by
the mappings in the new vm_object un_pager.vnp.writemappings
counter. The vnode v_writecount is incremented when writemappings gets
non-zero value, and decremented when writemappings is returned to
zero.
Writeable shared vnode-backed mappings are accounted for in vm_mmap(),
and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on
the created map entry. During deferred map entry deallocation,
vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and
decrements writemappings for the vm object.
Now, the writeable mount cannot be demoted to read-only while
writeable shared mappings of the vnodes from the mount point
exist. Also, execve(2) fails for such files with ETXTBUSY, as it
should be.
Noted by: tegge
Reviewed by: tegge (long time ago, early version), alc
Tested by: pho
MFC after: 3 weeks
|
|
|
|
|
| |
Discussed with: alc
MFC after: 3 days
|
|
|
|
|
|
|
| |
makes no sense to check the size of the kernel vm_map against the
user-level resource limits for the calling process.
Reviewed by: kib
|
|
|
|
|
|
|
|
|
| |
for a shared mapping and marking the entry for inheritance.
Other thread might execute vmspace_fork() in between (e.g. by fork(2)),
resulting in the mapping becoming private.
Noted and reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
Code should just use the devtoname() function to obtain the name of a
character device. Also add const keywords to pieces of code that need it
to build properly.
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
disconnected swap device.
This is quick and imperfect solution, as swap device will still be opened
and GEOM will not be able to destroy it. Proper solution would be to
automatically turn off and close disconnected swap device, but with existing
code it will cause panic if there is at least one page on device, even if
it is unimportant page of the user-level process. It needs some work.
Reviewed by: kib@
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create
Reviewed by: alc, avg
MFC after: 2 weeks
|
|
|
|
|
|
| |
pmap_remove() (changed in r228412).
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
u_int. With the auto-sized buffer cache on the modern machines, UFS
metadata can generate more the 65535 pages belonging to the buffers
undergoing i/o, overflowing the counter.
Reported and tested by: jimharris
Reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
generation change if requested mode is async. The object generation is
only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async
mode it is enough to write each dirty page, not to make a guarantee that
all pages are cleared after the vm_object_page_clean() returned.
Diagnosed by: truckman
Tested by: flo
Reviewed by: alc, truckman
MFC after: 2 weeks
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MS_SYNC flag. The system must guarantee that all writes are finished
before syscalls returned. Schedule the writes in async mode, which is
much faster and allows the clustering to occur. Wait for writes using
VOP_FSYNC(), since we are syncing the whole file mapping.
Potentially, the restriction to only apply the optimization can be
relaxed by not requiring that the mapping cover whole file, as it is
done by other OSes.
Reported and tested by: az
Reviewed by: alc
MFC after: 2 weeks
|
|
|
|
|
|
| |
stack cache list header accessible outside vm_glue.c.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
to vm.stats.sys. Move them back.
Noticed by: pho
Reviewed by: bde (earlier version)
Approved by: bz
MFC after: 1 week
Pointy hat to: me
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix some style(9) issues and reduce redundancy.
PR: kern/155491
PR: kern/155490
PR: kern/155489
Submitted by: Galimov Albert <wtfcrap@mail.ru>
Approved by: bde
Reviewed by: jhb
MFC after: 1 week
|
|
|
|
|
| |
Submitted by: az
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use superpage reservations. So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages. In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.
Eliminate some duplication of code in vm_reserv_alloc_page().
Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.
Tested by: marius (sparc64)
|
|
|
|
|
|
|
| |
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.
Reviewed by: attilio, alc
|
|
|
|
|
|
|
|
|
|
|
| |
Since the address of vm_page lock mutex depends on the kernel options,
it is easy for module to get out of sync with the kernel.
No vm_page_lockptr() accessor is provided for modules. It can be added
later if needed, unless proper KPI is developed to serve the needs.
Reviewed by: attilio, alc
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions that offer file and line specifications are:
- sx_assert_
- sx_downgrade_
- sx_slock_
- sx_slock_sig_
- sx_sunlock_
- sx_try_slock_
- sx_try_xlock_
- sx_try_upgrade_
- sx_unlock_
- sx_xlock_
- sx_xlock_sig_
- sx_xunlock_
Now vm_map locking is fully converted and can avoid to know specifics
about locking procedures.
Reviewed by: kib
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
defined and will allow consumers, willing to provide options, file and
line to locking requests, to not worry about options redefining the
interfaces.
This is typically useful when there is the need to build another
locking interface on top of the mutex one.
The introduced functions that consumers can use are:
- mtx_lock_flags_
- mtx_unlock_flags_
- mtx_lock_spin_flags_
- mtx_unlock_spin_flags_
- mtx_assert_
- thread_lock_flags_
Spare notes:
- Likely we can get rid of all the 'INVARIANTS' specification in the
ppbus code by using the same macro as done in this patch (but this is
left to the ppbus maintainer)
- all the other locking interfaces may require a similar cleanup, where
the most notable case is sx which will allow a further cleanup of
vm_map locking facilities
- The patch should be fully compatible with older branches, thus a MFC
is previewed (infact it uses all the underlying mechanisms already
present).
Comments review by: eadler, Ben Kaduk
Discussed with: kib, jhb
MFC after: 1 month
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
yielding a new public interface, vm_page_alloc_contig(). This new function
addresses some of the limitations of the current interfaces, contigmalloc()
and kmem_alloc_contig(). For example, the physically contiguous memory that
is allocated with those interfaces can only be allocated to the kernel vm
object and must be mapped into the kernel virtual address space. It also
provides functionality that vm_phys_alloc_contig() doesn't, such as wiring
the returned pages. Moreover, unlike that function, it respects the low
water marks on the paging queues and wakes up the page daemon when
necessary. That said, at present, this new function can't be applied to all
types of vm objects. However, that restriction will be eliminated in the
coming weeks.
From a design standpoint, this change also addresses an inconsistency
between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions.
Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other
functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew
about vnodes and reservations. Now, vm_page_alloc_contig() is responsible
for these things.
Reviewed by: kib
Discussed with: jhb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
layer for old KPI and KBI. New interface should be used together with
d_mmap_single cdevsw method.
Device pager can be allocated with the cdev_pager_allocate(9)
function, which takes struct cdev_pager_ops, containing
constructor/destructor and page fault handler methods supplied by
driver.
Constructor and destructor, called at the pager allocation and
deallocation time, allow the driver to handle per-object private data.
The pager handler is called to handle page fault on the vm map entry
backed by the driver pager. Driver shall return either the vm_page_t
which should be mapped, or error code (which does not cause kernel
panic anymore). The page handler interface has a placeholder to
specify the access mode causing the fault, but currently PROT_READ is
always passed there.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 1 month
|
|
|
|
|
| |
Submitted by: alc
MFC after: 1 week
|
|
|
|
|
|
| |
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
|
|
|
|
|
|
|
|
|
|
|
| |
allocate the requested page because too few pages are cached or free.
Document the VM_ALLOC_COUNT() option to vm_page_alloc() and
vm_page_alloc_freelist().
Make style changes to vm_page_alloc() and vm_page_alloc_freelist(),
such as using a variable name that more closely corresponds to the
comments.
|
|
|
|
| |
MFC after: 2 weeks
|