summaryrefslogtreecommitdiffstats
path: root/sys/i386/xen/pmap.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r272761:kib2014-10-151-8/+12
| | | | | | | | | Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. MFC r272943: MFi386 r272761.
* Fix a leak of the wired pages when unwiring of the PROT_NONE-mappedkib2014-09-011-22/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wired region. Rework the handling of unwire to do the it in batch, both at pmap and object level. All commits below are by alc. MFC r268327: Introduce pmap_unwire(). MFC r268591: Implement pmap_unwire() for powerpc. MFC r268776: Implement pmap_unwire() for arm. MFC r268806: pmap_unwire(9) man page. MFC r269134: When unwiring a region of an address space, do not assume that the underlying physical pages are mapped by the pmap. This fixes a leak of the wired pages on the unwiring of the region mapped with no access allowed. MFC r269339: In the implementation of the new function pmap_unwire(), the call to MOEA64_PVO_TO_PTE() must be performed before any changes are made to the PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic. MFC r269365: Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed by the combination of r268591 and r269134: When we attempt to add the wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing. (They only set the wired attribute on newly created mappings.) MFC r269433: Handle wiring failures in vm_map_wire() with the new functions pmap_unwire() and vm_object_unwire(). Retire vm_fault_{un,}wire(), since they are no longer used. MFC r269438: Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable "rv" is uninitialized. MFC r269485: Retire pmap_change_wiring(). Reviewed by: alc
* MFC r270038:kib2014-08-241-1/+0
| | | | Complete r254667, do not destroy pmap lock if KVA allocation failed.
* Merge the changes to pmap_enter(9) for sleep-less operation (requestedkib2014-08-241-24/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head. MFC r269368 (by alc): Retire PVO_EXECUTABLE. MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728. MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation. MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
* MFC 261781:jhb2014-06-271-2/+2
| | | | | | Don't waste a page of KVA for the boot-time memory test on x86. For amd64, reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386, remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.
* The pmap function pmap_clear_reference() is no longer used. Remove it.alc2013-09-201-35/+0
| | | | | | | | | pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-291-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Revert r254501. Instead, reuse the type stability of the struct pmapkib2013-08-221-3/+0
| | | | | | | | | | | which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-7/+5
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* - Add a BIT_FFS() macro and use it to replace cpusetffs_obj()jeff2013-06-131-1/+1
| | | | | Discussed with: attilio Sponsored by: EMC / Isilon Storage Division
* o Relax locking assertions for vm_page_find_least()attilio2013-05-211-1/+2
| | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichkib2013-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
* MFCattilio2013-03-171-0/+40
|\
| * Add pmap function pmap_copy_pages(), which copies the content of thekib2013-03-141-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
* | Merge from vmcontention.attilio2013-03-091-6/+6
|\ \ | |/
| * MFCattilio2013-03-081-18/+18
| |\
| * | Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
| * | There is no need to use VM_OBJECT_LOCKED() as the assertion won'tattilio2013-02-201-3/+2
| | | | | | | | | | | | | | | make the check available in any case if INVARIANTS is switched off. Remove VM_OBJECT_LOCKED().
| * | Switch vm_object lock to be a rwlock.attilio2013-02-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* | | Fixup XEN pmap to cope with removal of left/right iterators fromattilio2013-03-031-3/+4
| |/ |/| | | | | | | | | pages. Sponsored by: EMC / Isilon storage division
* | Fix-up r247622 by also renaming pv_list iterator into the xenattilio2013-03-031-18/+18
| | | | | | | | | | | | | | pmap verbatim copy. Sponsored by: EMC / Isilon storage division Reported by: tinderbox
* | Merge from vmobj-rwlock:attilio2013-02-271-3/+2
|/ | | | | | | | | | | VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho
* Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There isjkim2013-02-151-3/+3
| | | | no functional change.
* Replace all uses of the vm page queues lock by a new R/W lock.alc2012-10-121-58/+72
| | | | | | | Unfortunately, this lock cannot be defined as static under Xen because it is (ab)used to serialize queued page table changes. Tested by: sbruno
* MFi386 r241356alc2012-10-101-0/+9
| | | | | | Add several asserts. MFC after: 3 days
* In a few places, like the implementation of ptrace(), a thread may callalc2012-10-081-6/+8
| | | | | | | | | | | | | | | | | | | | | upon pmap_enter() to create a mapping within a different address space, i.e., not the thread's own address space. On i386, this entails the creation of a temporary mapping to the affected page table page (PTP). In general, pmap_enter() will read from this PTP, allocate a PV entry, and write to this PTP. The trouble comes when the system is short of memory. In order to allocate a new PV entry, an older PV entry has to be reclaimed. Reclaiming a PV entry involves destroying a mapping, which requires access to the affected PTP. Thus, the PTP mapped at the beginning of pmap_enter() is no longer mapped at the end of pmap_enter(), which leads to pmap_enter() modifying the wrong PTP. To address this problem, pmap_pv_reclaim() is changed to use an alternate method of mapping PTPs. Update a related comment. Reported by: pho Diagnosed by: kib MFC after: 5 days
* Eliminate a stale comment. It describes another use case for the pmap inalc2012-09-281-7/+0
| | | | Mach that doesn't exist in FreeBSD.
* Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),alc2012-09-101-6/+1
| | | | | | | | | | | | | pmap_unmapdev()'s own direct efforts to destroy the page table entries are redundant, so eliminate them. Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS. Setting PTE_W on MIPS is inconsistent with the implementation of this function on other architectures. Moreover, PTE_W should not be set, unless the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}() doesn't do. MFC after: 10 days
* Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update thealc2012-09-051-17/+18
| | | | | | | | | | comment describing them. Both the function names and the comment had grown stale. Quite some time has passed since these pmap implementations last used the page's hold count to track the number of valid mapping within a page table page. Also, returning TRUE from pmap_unwire_ptp() rather than _pmap_unwire_ptp() eliminates a few instructions from callers like pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used directly by a conditional statement.
* Eliminate an unnecessary acquisition and release of the page queues lockalc2012-08-101-2/+0
| | | | | | | from pmap_pte(). PT_SET_MA() is not a queued mapping update, but instead an immediate mapping update, so the page queues lock is not required here. Reviewed by: cperciva
* Various small changes to PV entry management:alc2012-06-041-13/+16
| | | | | | | | | | | | | | | | | | | Constify pc_freemask[]. pmap_pv_reclaim() Eliminate "freemask" because it was a pessimization. Add a comment about the resident count adjustment. free_pv_entry() [i386 only] Merge an optimization from amd64 (r233954). get_pv_entry() Eliminate the move to tail of the pv_chunk on the global pv_chunks list. (The right strategy needs more thought. Moreover, there were unintended differences between the amd64 and i386 implementation.) pmap_remove_pages() Eliminate unnecessary ()'s.
* Eliminate code duplication in free_pv_entry() and pmap_remove_pages() byalc2012-06-011-10/+10
| | | | introducing free_pv_chunk().
* Eliminate some purely stylistic differences among the amd64, i386 native,alc2012-05-301-4/+4
| | | | and i386 xen PV entry allocators.
* MFi386 pmap r233433alc2012-05-291-1/+0
| | | | | Disable detailed PV entry accounting by default. (A config option for enabling it was already introduced in r233433.)
* Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it noalc2012-05-291-59/+121
| | | | | | | | | | | | | | | longer uses the active and inactive paging queues. Instead, the pmap now maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses this list to select pv entries for reclamation. Note: The old pmap_collect() tried to avoid reclaiming mappings for pages that have either a hold_count or a busy field that is non-zero. However, this isn't necessary for correctness, and the locking in pmap_collect() was insufficient to guarantee that such mappings weren't reclaimed. The new pmap_pv_reclaim() doesn't even try. Tested by: sbruno MFC after: 5 weeks
* Merge r216333 and r216555 from the native pmapalc2011-12-301-10/+12
| | | | | | | | | | | | | | | | | | | | | When r207410 eliminated the acquisition and release of the page queues lock from pmap_extract_and_hold(), it didn't take into account that pmap_pte_quick() sometimes requires the page queues lock to be held. This change reimplements pmap_extract_and_hold() such that it no longer uses pmap_pte_quick(), and thus never requires the page queues lock. Merge r177525 from the native pmap Prevent the overflow in the calculation of the next page directory. The overflow causes the wraparound with consequent corruption of the (almost) whole address space mapping. Strictly speaking, r177525 is not required by the Xen pmap because the hypervisor steals the uppermost region of the normal kernel address space. I am nonetheless merging it in order to reduce the number of unnecessary differences between the native and Xen pmap implementations. Tested by: sbruno
* Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():alc2011-12-281-3/+6
| | | | | | | If the page lock acquisition is retried, then the underlying thread is not unpinned. Wrap nearby lines that exceed 80 columns.
* Eliminate many of the unnecessary differences between the native andalc2011-12-271-126/+163
| | | | | | | | | paravirtualized pmap implementations for i386. This includes some style fixes to the native pmap and several bug fixes that were not previously applied to the paravirtualized pmap. Tested by: sbruno MFC after: 3 weeks
* The Xen pmap doesn't support superpages. So, there is no point in italc2011-12-201-51/+2
| | | | | | | | | initializing structures, like the pv table, that are only used to implement superpages. In fact, some of the unnecessary code in pmap_init() was actually doing harm. It was preventing the kernel from booting on virtual machines with more than 768 MB of memory. Tested by: sbruno
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-2/+2
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls toalc2011-10-271-7/+3
| | | | | | vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-20/+20
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flagkib2011-08-091-17/+16
| | | | | | | | | | | | | | to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
* With retirement of cpumask_t and usage of cpuset_t for representing aattilio2011-07-041-21/+32
| | | | | | | | | | | | | | | mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient. Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))). This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement. MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
* When iterating over a paging queue, explicitly check for PG_MARKER, insteadalc2011-07-021-1/+1
| | | | | | of relying on zeroed memory being interpreted as an empty PV list. Reviewed by: kib
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-291-2/+2
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* - Fix a misusage of cpuset_t objectsattilio2011-05-241-1/+1
| | | | | | - Fix a typo Reported by: pluknet
* Add a "safety belt" check for lsb setting.attilio2011-05-221-0/+1
| | | | | | | I don't think it is really necessary because the cpumask is known to be != 0, but it is just in case. Requested by: kib
* Reintroduce the lazypmap infrastructure and convert it to usingattilio2011-05-201-0/+99
| | | | | | cpuset_t. Requested by: alc
OpenPOWER on IntegriCloud