| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Move the fast fault path into the separate function.
|
|
|
|
| |
Do not sleep in vm_wait() if pagedaemon did not yet started. Panic instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move and revise a comment about the relation between the object's paging-
in-progress count and the vnode. Prior to r188331, we always acquired
the vnode lock before incrementing the object's paging-in-progress count.
Now, we increment it before attempting to acquire the vnode lock with
LK_NOWAIT, but we never sleep acquiring the vnode lock while we have the
count incremented.
In vm_fault()'s loop over the shadow chain, move a comment describing our
invariants to a better place. Also, add two comments concerning the
relationship between the map and vnode locks.
|
|
|
|
| |
Change remained internal uses of boolean_t to bool in vm/vm_fault.c.
|
|
|
|
| |
Remove vm_pager_has_page() declaration.
|
|
|
|
| |
Remove vnode_locked label and goto.
|
|
|
|
|
| |
Split long line instead of unindenting it. Add KASSERT() verifying
that a device object with the same handle has the same ops vector.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With one exception, "hardfault" is used like a "bool". Change that
exception and make it a "bool".
The "lookup_still_valid" field is used as a "bool". Make it one.
Convert vm_fault_hold()'s Boolean variables that are only used
internally to "bool". Add a comment describing why the one
remaining "boolean_t" was not converted.
Merge and sort vm_fault_hold()'s "int" variable definitions.
|
|
|
|
|
|
|
| |
Add unlock_vp() helper.
MFC r308095 (by markj):
Add one more use of unlock_vp().
|
|
|
|
|
|
| |
Make the page daemon's notion of what kind of pass is being performed
by vm_pageout_scan() local to vm_pageout_worker(). There is no reason
to store the pass in the NUMA domain structure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change vm_pageout_scan() to return a value indicating whether the free page
target was met.
Previously, vm_pageout_worker() itself checked the length of the free page
queues to determine whether vm_pageout_scan(pass >= 1)'s inactive queue scan
freed enough pages to meet the free page target. Specifically,
vm_pageout_worker() used vm_paging_needed(). The trouble with
vm_paging_needed() is that it compares the length of the free page queues to
the wakeup threshold for the page daemon, which is much lower than the free
page target. Consequently, vm_pageout_worker() could conclude that the
inactive queue scan succeeded in meeting its free page target when in fact
it did not; and rather than immediately triggering an all-out laundering
pass over the inactive queue, vm_pageout_worker() would go back to sleep
waiting for the free page count to fall below the page daemon wakeup
threshold again, at which point it will perform another limited (pass == 1)
scan over the inactive queue.
Changing vm_pageout_worker() to use vm_page_count_target() instead of
vm_paging_needed() won't work because any page allocations that happen
concurrently with the inactive queue scan will result in the free page count
being below the target at the end of a successful scan. Instead, having
vm_pageout_scan() return a value indicating success or failure is the most
straightforward fix.
|
|
|
|
|
| |
If vm_fault_hold(9) finds that fs.m is wired, do not free it after a
pager error, leave the page to the wire owner.
|
|
|
|
| |
Export vm_page_xunbusy_maybelocked().
|
|
|
|
| |
Plug a vnode lock leak in vm_fault_hold().
|
|
|
|
| |
Fix a race in vm_page_busy_sleep(9).
|
|
|
|
|
| |
When downgrading exclusively busied page to shared-busy state, wakeup
waiters.
|
|
|
|
| |
Restore swap pager readahead.
|
|
|
|
| |
Make swapoff reliable.
|
|
|
|
|
| |
Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when
neither vm_pager_has_page() nor vm_pager_get_pages() is called.
|
|
|
|
| |
Initialize busy lock state and strengthen busy lock assertions.
|
|
|
|
| |
Eliminate vprint().
|
|
|
|
|
|
|
|
|
|
| |
Correct errors and clean up the comments on the active queue scan.
Eliminate some unnecessary blank lines.
Clean up the comments and code style in and around vm_pageout_cluster().
In particular, fix factual, grammatical, and spelling errors in various
comments, and remove comments that are out of place in this function.
|
|
|
|
| |
Update a comment in vm_page_advise() to match behaviour after r290529.
|
|
|
|
| |
Release the second critical section in uma_zfree_arg() slightly earlier.
|
|
|
|
| |
Use vm_page_undirty() instead of manually setting a page field.
|
|
|
|
| |
De-pluralize "queues" in the pagedaemon code.
|
|
|
|
|
|
| |
Correct a spelling error.
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
| |
Do not delegate a work to geom event thread which can be done inline.
MFC r303703:
Explain why swapgeom_close_ev() is delegated.
Approved by: re (gjb)
|
|
|
|
|
|
|
| |
Remove a probe declaration that has been unused since r292469, when
vm_pageout_grow_cache() was replaced.
Approved by: re (gjb)
|
|
|
|
|
|
| |
Fix style and typo.
Approved by: re (gjb)
|
|
|
|
|
|
|
| |
Remove any mention of cache (PG_CACHE) pages from the comments in
vm_pageout_scan(). That function has not cached pages since r284376.
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
| |
In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called,
if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate().
MFC r302580:
Fix grammar.
Approved by: re (gjb)
|
|
|
|
|
|
| |
Add a comment describing the 'fast path' that was introduced in r270011.
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Break up vm_fault()'s implementation of the read-ahead and delete-behind
optimizations into two distinct pieces. The first piece consists of the
code that should only be performed once per page fault and requires the
map to be locked. The second piece consists of the code that should be
performed each time a pager is called on an object in the shadow chain.
(This second piece expects the map to be unlocked.)
Previously, the entire implementation could be executed multiple times.
Moreover, the second and subsequent executions would occur with the map
unlocked. Usually, the ensuing unsynchronized accesses to the map were
harmless because the map was not changing. Nonetheless, it was possible
for a use-after-free error to occur, where vm_fault() wrote to a freed
map entry. This change corrects that problem.
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vm_offset_t. (This field is used to detect sequential access to the virtual
address range represented by the map entry.) There are three reasons to
make this change. First, a vm_offset_t is smaller on 32-bit architectures.
Consequently, a struct vm_map_entry is now smaller on 32-bit architectures.
Second, a vm_offset_t can be written atomically, whereas it may not be
possible to write a vm_pindex_t atomically on a 32-bit architecture. Third,
using a vm_pindex_t makes the next_read field dependent on which object in
the shadow chain is being read from.
Replace an "XXX" comment.
Reviewed by: kib
Approved by: re (gjb)
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of CPUs present. On amd64 this unbreaks the boot for systems with 92 or
more CPUs; the limit will vary on other systems depending on the size of
their uma_zone and uma_cache structures.
The major consumer of pages during UMA startup is the 19 zone structures
which are set up before UMA has bootstrapped itself sufficiently to use
the rest of the available memory: UMA Slabs, UMA Hash, 4 / 6 / 8 / 12 /
16 / 32 / 64 / 128 / 256 Bucket, vmem btag, VM OBJECT, RADIX NODE, MAP,
KMAP ENTRY, MAP ENTRY, VMSPACE, and fakepg. If the zone structures occupy
more than one page, they will not share pages and the number of pages
currently needed for startup is 19 * pages_per_zone + N, where N is the
number of pages used for allocating other structures; on amd64 N = 3 at
present (2 pages are allocated for UMA Kegs, and one page for UMA Hash).
This patch adds a new definition UMA_BOOT_PAGES_ZONES, currently set to 32,
and if a zone structure does not fit into a single page sets boot_pages to
UMA_BOOT_PAGES_ZONES * pages_per_zone instead of UMA_BOOT_PAGES (which
remains at 64). Consequently this patch has no effect on systems where the
zone structure fits into 2 or fewer pages (on amd64, 59 or fewer CPUs), but
increases boot_pages sufficiently on systems where the large number of CPUs
makes this structure larger. It seems safe to assume that systems with 60+
CPUs can afford to set aside an additional 128kB of memory per 32 CPUs.
The vm.boot_pages tunable continues to override this computation, but is
unlikely to be necessary in the future.
Tested on: EC2 x1.32xlarge
Relnotes: FreeBSD can now boot on 92+ CPU systems without requiring
vm.boot_pages to be manually adjusted.
Reviewed by: jeff, alc, adrian
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.
There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.
PR: kern/210106
Reviewed by: jhb
Approved by: re (glebius)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
objects.
Assert that there is no new waiters for the already terminated objects.
Old waiters should have been notified by the termination calling
vnode_pager_dealloc() (old/new are with regard of the lock acquisition
interval).
Only clear the vp->v_object for the case of already terminated object,
since other branches call vnode_pager_dealloc(), which should clear
the pointer. Assert this.
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
|
|
|
|
|
|
| |
Requested by: alc
MFC after: 1 week
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sleepable scan, iteration over the shadow chain looking for a page
could find an OBJ_DEAD object. Such state of the mapping is only
transient, the dead object will be terminated and removed from the
chain shortly. We must not return KERN_PROTECTION_FAILURE unless the
object type is changed to OBJT_DEAD in the chain, indicating that
paging on this address is really impossible. Returning
KERN_PROTECTION_FAILURE prematurely causes spurious SIGSEGV delivered
to processes, or kernel accesses to UVA spuriously failing with
EFAULT.
If the object with OBJ_DEAD flag is found, only return
KERN_PROTECTION_FAILURE when object type is already OBJT_DEAD.
Otherwise, sleep a tick and retry the fault handling.
Ideally, we would wait until the OBJ_DEAD flag is resolved, e.g. by
waiting until the paging on this object is finished. But to do so, we
need to reference the dead object, while vm_object_collapse() insists
on owning the final reference on the collapsed object. This could be
fixed by e.g. changing the assert to shared reference release between
vm_fault() and vm_object_collapse(), but it seems to be too much
complications for rare boundary condition.
PR: 204426
Tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
X-Differential revision: https://reviews.freebsd.org/D6085
MFC after: 2 weeks
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
waiters exist, same as for vm_page_xunbusy(). If previous value of
busy_lock was VPB_SINGLE_EXCLUSIVER, no waiters existed and wakeup is
not needed.
Move common code from vm_page_xunbusy_maybelocked() and
vm_page_xunbusy_hard() to vm_page_xunbusy_locked().
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
|
|
|
|
|
|
|
| |
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is an order between covered vnode lock and allproc_lock, which
is established by calling mountcheckdirs() while owning the covered
vnode lock. mountcheckdirs() iterates over the processes, protected by
allproc_lock. This order is needed and seems to be not avoidable.
On the other hand, various VM daemons also need to iterate over all
processes, and they lock and unlock user maps. Since unlock of the
user map may trigger processing of the deferred map entries, it causes
vnode locking to occur. Or, when vmspace is freed, dropping references
on the vnode-backed object also lock vnodes. We get reverted order
comparing with the mount/unmount order.
For VM daemons, there is no need to own allproc_lock while we operate
on vmspaces. If the process is held, it serves as the marker for
allproc list, which allows to continue the iteration.
Add _PHOLD_LITE() macro, similar to _PHOLD(), but not causing swap-in
of the kernel stacks. It is used instead of _PHOLD() in vm code,
since e.g. calling faultin() in OOM conditions only exaggerates the
problem.
Modernize comment describing PHOLD.
Reported by: lists@yamagi.org
Tested by: pho (previous version)
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 week
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D6679
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
statistics. Marking is done by setting the OBJ_ACTIVE flag. The
flags change is locked, but the problem is that many parts of system
assume that vm object initialization ensures that no other code could
change the object, and thus performed lockless. The end result is
corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag,
causing random hangs.
Avoid the active object marking, instead provide equally inexact but
immutable is_object_alive() definition for the object mapped state.
Avoid iterating over the processes mappings altogether by using
arguably improved definition of the paging thread as one which sleeps
on the v_free_count.
PR: 204764
Diagnosed by: pho
Tested by: pho (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now, all modifications of the list are locked by sw_alloc_mtx.
But initial lookup of the object by the handle in swap_pager_alloc()
is not protected by sw_alloc_mtx, which means that
vm_pager_object_lookup() could follow freed pointer.
Create a new named swap object with the OBJT_SWAP type, instead
of OBJT_DEFAULT. With this change, swp_pager_meta_build() never need
to upgrade named OBJT_DEFAULT to OBJT_SWAP (in the other place, we do
not forbid for client code to create named OBJT_DEFAULT objects at
all).
That change allows to remove sw_alloc_mtx and make the list locked by
sw_alloc_sx lock. Update swap_pager_copy() to new locking mode.
Create helper swap_pager_alloc_init() to consolidate named and
anonymous swap objects creation, while a caller ensures that the
neccesary locks are held around the helper.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (hrs)
|
|
|
|
|
|
|
|
|
| |
but works due to zeroed out bss on startup.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
|
|
|
|
|
|
|
|
| |
Freeing a shared-busy page is not permitted.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6670
|
|
|
|
|
|
|
|
|
|
| |
Per the KASSERT at the beginning of the function, we expect that the page
does not belong to any object, so its object and pindex fields are
meaningless. Reset them in the rare case that vm_radix_insert() fails.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6669
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With r284861, UMA zones use the trash ctor and dtor by default. This is
incompatible with memguard, which frees the backing page when the item
is freed. Modify the UMA debug functions to be no-ops if the item was
allocated from memguard. This also fixes constructors such as
mb_ctor_pack(), which invokes the trash ctor in addition to performing
some initialization.
Reviewed by: glebius
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D6562
|
|
|
|
|
|
|
|
|
| |
acquire the page lock, which recurses. Avoid the recursion by reusing
the code from vm_page_remove() in a new helper
vm_page_xunbusy_maybelocked().
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
|