summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_page.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r287235:markj2015-11-131-54/+25
| | | | Remove weighted page handling from vm_page_advise().
* MFC r283795alc2015-09-271-0/+1
| | | | Document vm_page_alloc_contig()'s support for the VM_ALLOC_NODUMP option.
* MFC r287944alc2015-09-271-1/+2
| | | | Eliminate (many) unnecessary calls to pmap_remove_all().
* MFC r283162:kib2015-05-271-2/+3
| | | | | | | | | Set VPO_UNMANAGED on the freed page when insertion of the page into the object queue failed, to satisfy the assertion. MFC r283163: Do grammar fix in the comment to record the right commit message for r283162.
* MFC r273701, r274556alc2015-01-021-4/+25
| | | | | | | | | | | By the time that pmap_init() runs, vm_phys_segs[] has been initialized. Obtaining the end of memory address from vm_phys_segs[] is a little easier than obtaining it from phys_avail[]. Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default on i386 PAE. (The use of VM_PHYSSEG_SPARSE on i386 PAE saves us some precious kernel virtual address space that would have been wasted on unused vm_page structures.)
* MFC r269746:kib2014-08-241-0/+18
| | | | | Adapt vm_page_aflag_set(PGA_WRITEABLE) to the locking of pmap_enter(PMAP_ENTER_NOSLEEP).
* MFC r267213 (by alc):kib2014-07-241-0/+25
| | | | | | Add a page size field to struct vm_page. Approved by: alc
* MFC r259107alc2014-05-231-1/+1
| | | | | | Eliminate a redundant parameter to vm_radix_replace(). Improve the wording of the comment describing vm_radix_replace().
* PG_SLAB no longer serves a useful purpose, since m->object is nokib2013-09-171-4/+2
| | | | | | | | longer abused to store pointer to slab. Remove it. Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (hrs)
* Remove zero-copy sockets code. It only worked for anonymous memory,kib2013-09-161-103/+1
| | | | | | | | | | | | | and the equivalent functionality is now provided by sendfile(2) over posix shared memory filedescriptor. Remove the cow member of struct vm_page, and rearrange the remaining members. While there, make hold_count unsigned. Requested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (delphij)
* If the last page of the file is partially full and whole validkib2013-09-141-3/+10
| | | | | | | | | | | | | portion is invalidated, invalidate the whole page. Otherwise, partially valid page appears on a page queue, which is wrong. This could only happen for the last page, because only then buffer which triggered invalidation could not cover the whole page. Reported and tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (delphij) MFC after: 2 weeks
* The vm_page_trysbusy() should not fail when shared busy counter orkib2013-09-051-3/+7
| | | | | | | | | | | | | | | VPB_BIT_WAITERS flag were changed between reading of busy_lock and the cas. The vm_page_sbusy(), which is the only user of vm_page_trysbusy() in the tree, panics on the failure, which in these cases is transient and do not mean that the current page state prevents sbusying. Retry the operation inside vm_page_trysbusy() if cas failed, only return a failure when VPB_BIT_SHARED is cleared. Reported and tested by: pho Reviewed by: attilio Sponsored by: The FreeBSD Foundation
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-291-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Remove comment that is no longer relevant since r254182.glebius2013-08-261-4/+0
|
* Addendum to r254141: The call to vm_radix_insert() in vm_page_cache() canalc2013-08-231-0/+9
| | | | | | | | | | reclaim the last preexisting cached page in the object, resulting in a call to vdrop(). Detect this scenario so that the vnode's hold count is correctly maintained. Otherwise, we panic. Reported by: scottl Tested by: pho Discussed with: attilio, jeff, kib
* Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).kib2013-08-221-7/+1
| | | | | | | | The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation
* Addendum to r254141: Allow recursion on the free pages queues lock inalc2013-08-211-1/+1
| | | | | | | vm_page_alloc_freelist(). Reported and tested by: sbruno Sponsored by: EMC / Isilon Storage Division
* On the recovery path for vm_page_alloc(), if a page had been requestedattilio2013-08-151-0/+9
| | | | | | | | wired, unwind back the wiring bits otherwise we can end up freeing a page that is considered wired. Sponsored by: EMC / Isilon storage division Reported by: alc
* Improve pageout flow control to wakeup more frequently and do less work whilejeff2013-08-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | maintaining better LRU of active pages. - Change v_free_target to include the quantity previously represented by v_cache_min so we don't need to add them together everywhere we use them. - Add a pageout_wakeup_thresh that sets the free page count trigger for waking the page daemon. Set this 10% above v_free_min so we wakeup before any phase transitions in vm users. - Adjust down v_free_target now that we're willing to accept more pagedaemon wakeups. This means we process fewer pages in one iteration as well, leading to shorter lock hold times and less overall disruption. - Eliminate vm_pageout_page_stats(). This was a minor variation on the PQ_ACTIVE segment of the normal pageout daemon. Instead we now process 1 / vm_pageout_update_period pages every second. This causes us to visit the whole active list every 60 seconds. Previously we would only maintain the active LRU when we were short on pages which would mean it could be woefully out of date. Reviewed by: alc (slight variant of this) Discussed with: alc, kib, jhb Sponsored by: EMC / Isilon Storage Division
* Correct the recovery logic in vm_page_alloc_contig:attilio2013-08-111-4/+2
| | | | | | | | | | | what is really needed on this code snipped is that all the pages that are already fully inserted gets fully freed, while for the others the object removal itself might be skipped, hence the object might be set to NULL. Sponsored by: EMC / Isilon storage division Reported by: alc, kib Reviewed by: alc
* Different consumers of the struct vm_page abuse pageq member to keepkib2013-08-101-23/+28
| | | | | | | | | | | | | | | | | additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation
* Revert the addition of VPO_BUSY and instead update vm_page_replace() tojhb2013-08-091-6/+6
| | | | | | properly unbusy the page. Submitted by: alc
* On all the architectures, avoid to preallocate the physical memoryattilio2013-08-091-32/+211
| | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-92/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* Split the pagequeues per NUMA domains, and split pageademon processkib2013-08-071-44/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page. The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency. Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations. The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread. Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed. Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
* In the vm_page_set_invalid() function, do not assert that the page iskib2013-07-111-2/+0
| | | | | | | | | | | | not busy, since its only caller brelse() can legitimately call it on busy page. This happens for VOP_PUTPAGES() on filesystems that use buffers and which VOP_WRITE() method marked the buffer containing page as non-cacheable. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* vm_phys_fictitious_reg_range() was losing the 'memattr' because it would beneel2013-07-031-0/+1
| | | | | | | | | | | | | reset by pmap_page_init() right after being initialized in vm_page_initfake(). The statement above is with reference to the amd64 implementation of pmap_page_init(). Fix this by calling 'pmap_page_init()' in 'vm_page_initfake()' before changing the 'memattr'. Reviewed by: kib MFC after: 2 weeks
* Typo in comment.glebius2013-06-241-1/+1
|
* Revise the interface between vm_object_madvise() and vm_page_dontneed() soalc2013-06-101-8/+27
| | | | | | | that pointless calls to pmap_is_modified() can be easily avoided when performing madvise(..., MADV_FREE). Sponsored by: EMC / Isilon Storage Division
* Remove irrelevant comments.kib2013-06-031-7/+0
| | | | | Discussed with: alc MFC after: 3 days
* Require that the page lock is held, instead of the object lock, whenalc2013-06-031-0/+7
| | | | | | | | | | | | | | | | | | | clearing the page's PGA_REFERENCED flag. Since we are typically manipulating the page's act_count field when we are clearing its PGA_REFERENCED flag, the page lock is already held everywhere that we clear the PGA_REFERENCED flag. So, in fact, this revision only changes some comments and an assertion. Nonetheless, it will enable later changes to object locking in the pageout code. Introduce vm_page_assert_locked(), which completely hides the implementation details of the page lock from the caller, and use it in vm_page_aflag_clear(). (The existing vm_page_lock_assert() could not be used in vm_page_aflag_clear().) Over the coming weeks, I expect that we'll either eliminate or replace the various uses of vm_page_lock_assert() with vm_page_assert_locked(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* Now that access to the page's "act_count" field is synchronized by the pagealc2013-06-011-1/+0
| | | | | | | | lock instead of the object lock, there is no reason for vm_page_activate() to assert that the object is locked for either read or write access. (The "VPO_UNMANAGED" flag never changes after page allocation.) Sponsored by: EMC / Isilon Storage Division
* o Relax locking assertions for vm_page_find_least()attilio2013-05-211-1/+1
| | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Add ddb command 'show pginfo' which provides useful information aboutkib2013-05-211-0/+23
| | | | | | | | | | a vm page, denoted either by an address of the struct vm_page, or, if the '/p' modifier is specified, by a physical address of the corresponding frame. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Relax the object locking assertion in vm_page_lookup(). Now that a radixalc2013-05-171-1/+1
| | | | | | | | tree is used to maintain the object's collection of resident pages, vm_page_lookup() no longer needs an exclusive lock. Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* Bandaid for compiling with gcc, which happens to be the default compilerpeter2013-05-131-0/+1
| | | | for a number of platforms still.
* Refactor vm_page_alloc()'s interactions with vm_reserv_alloc_page() andalc2013-05-121-19/+51
| | | | | | | | | | vm_page_insert() so that (1) vm_radix_lookup_le() is never called while the free page queues lock is held and (2) vm_radix_lookup_le() is called at most once. This change reduces the average time that the free page queues lock is held by vm_page_alloc() as well as vm_page_alloc()'s average overall running time. Sponsored by: EMC / Isilon Storage Division
* Most allocation of pages to objects proceeds from lower to higheralc2013-03-171-5/+5
| | | | | | | | | | | | | | | | indices. Consequentially, vm_page_insert() should use vm_radix_lookup_le() instead of vm_radix_lookup_ge(). Here's why. In the expected case, vm_radix_lookup_le() will quickly find a page less than the specified key at the same radix node. In contrast, vm_radix_lookup_ge() is expected to return NULL, but to do that it must examine every slot in the radix tree that is greater than the key. Prior to this change, the average cost of a vm_page_insert() call on my test machine was 992 cycles. After this change, the average cost is only 532 cycles, a reduction of 46%. Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* Simplify the interface to vm_radix_insert() by eliminating the parameteralc2013-03-171-3/+3
| | | | | | | | | | | | "index". The content of a radix tree leaf, or at least its "key", is not opaque to the other radix tree operations. Specifically, they know how to extract the "key" from a leaf. So, eliminating the parameter "index" isn't breaking the abstraction. Moreover, eliminating the parameter "index" effectively prevents the caller from passing an inconsistent "index" and leaf to vm_radix_insert(). Reviewed by: attilio Sponsored by: EMC / Isilon Storage Division
* MFCattilio2013-03-121-4/+1
|
* When transferring the page from one object to another, don't insert thealc2013-03-121-1/+1
| | | | | | | | page into its new object until the page's pindex has been updated. Otherwise, one code path within vm_radix_insert() may use the wrong pindex value. Sponsored by: EMC / Isilon Storage Division
* MFCattilio2013-03-111-1/+1
|\
| * Update a comment: The object lock is no longer a mutex.alc2013-03-091-1/+1
| |
* | Introduce vm_radix_is_empty(), and use it in place ofalc2013-03-101-4/+4
| | | | | | | | | | | | | | vm_object_cache_is_empty() where the caller is aware of the page cache's implementation as a radix trie. Sponsored by: EMC / Isilon Storage Division
* | Merge from vmcontention.attilio2013-03-091-37/+38
|\ \
| * \ MFCattilio2013-03-091-59/+160
| |\ \ | | |/
| | * Switch the vm_object mutex to be a rwlock. This will enable in theattilio2013-03-091-37/+38
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
| | | * As VM_OBJECT_SLEEP() is a vm_object_t specific function, makeattilio2013-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | the passed object as the first argument of the function for consistency. Sponsored by: EMC / Isilon storage revision
| | | * Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
| | | * Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
OpenPOWER on IntegriCloud