summaryrefslogtreecommitdiffstats
path: root/sys/amd64/amd64/pmap.c
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Proposed fix for CVE-2018-8897"Renato Botelho2018-05-081-3/+0
| | | | This reverts commit 70d1caf0ad967030b2ce835dc0f116ed1733c82c.
* Proposed fix for CVE-2018-8897Renato Botelho2018-05-081-0/+3
|
* Merge remote-tracking branch 'origin/releng/11.1' into RELENG_2_4Luiz Souza2018-05-081-2/+4
|\
| * Add mitigations for two classes of speculative execution vulnerabilitiesgordon2018-03-141-29/+547
| | | | | | | | | | | | | | | | | | on amd64. [FreeBSD-SA-18:03.speculative_execution] Approved by: so Security: FreeBSD-SA-18:03.speculative_execution Security: CVE-2017-5715 Security: CVE-2017-5754
* | Revert "Revert "MFC ↵Luiz Souza2018-02-231-29/+545
| | | | | | | | | | | | r328083,328096,328116,328119,328120,328128,328135,328153,328157,"" This reverts commit d3d59b01294138e59995b31d2bcbbbdf45e26a3c.
* | Merge remote-tracking branch 'origin/RELENG_2_4-meltdown' into RELENG_2_4Luiz Souza2018-02-231-3/+2
|\ \
| * | MFC r325530 (jeff), r325566 (kib), r325588 (kib):markj2018-02-211-3/+2
| | | | | | | | | | | | | | | | | | Replace many instances of VM_WAIT with blocking page allocation flags. (cherry picked from commit 2069f0080fbdcf49b623bc3c1eda76524a4d1a77)
* | | Revert "MFC r328083,328096,328116,328119,328120,328128,328135,328153,328157,"Luiz Souza2018-02-211-545/+29
|/ / | | | | | | This reverts commit 430a2bea3907149b30cc75fc722b6cf1f81da82a.
* | MFC r328083,328096,328116,328119,328120,328128,328135,328153,328157,kib2018-02-191-29/+545
|/ | | | | | | | | | | | | | 328166,328177,328199,328202,328205,328468,328470,328624,328625,328627, 328628,329214,329297,329365: Meltdown mitigation by PTI, PCID optimization of PTI, and kernel use of IBRS for some mitigations of Spectre. Tested by: emaste, Arshan Khanifar <arshankhanifar@gmail.com> Discussed with: jkim Sponsored by: The FreeBSD Foundation (cherry picked from commit 6dd025b40ee6870bea6ba670f30dcf684edc3f6c)
* MFC r314310alc2017-06-281-31/+39
| | | | | | | | | | Refine the fix from r312954. Specifically, add a new PDE-only flag, PG_PROMOTED, that indicates whether lingering 4KB page mappings might need to be flushed on a PDE change that restricts or destroys a 2MB page mapping. This flag allows the pmap to avoid range invalidations that are both unnecessary and costly. Approved by: re (kib)
* MFC r308474, r308691, r309203, r309365, r309703, r309898, r310720,markj2017-05-231-30/+12
| | | | | r308489, r308706: Add PQ_LAUNDRY and remove PG_CACHED pages.
* MFC r318354 (by cem)vangyzen2017-05-191-1/+1
| | | | | | | | | | | | | Correct page frame mask constant used in pmap_change_attr_locked This was introduced in r290156. It's present in 11.0, but not any 10.x release unless someone decided to MFC it. It affects ordinary pages right above the DMAP limit, which is effectively system memory rounded up to a 1 GB (3rd level superpage) boundary (or up to a minimum of 4 GB, on small systems). Sponsored by: Dell EMC
* MFC r313982, r314068:pfg2017-03-141-1/+1
| | | | | | sys: Replace zero with NULL for pointers. Found with: devel/coccinelle
* MFC r313960alc2017-02-261-8/+6
| | | | | In pmap_enter(), set the PG_MANAGED flag on the new PTE in one place, rather two places, and do so before the pmap lock is acquired.
* MFC r313933, r313939, r313966:kib2017-02-261-3/+3
| | | | | Microoptimize pmap_protect_pde() on amd64, i386 and pmap_protect_pte1() on armv6.
* MFC r312954:kib2017-02-051-11/+34
| | | | | Do not leave stale 4K TLB entries on pde (superpage) removal or protection change.
* MFC r312555:kib2017-02-031-5/+9
| | | | Use SFENCE for ordering CLFLUSHOPT.
* MFC r311902:markj2017-02-031-8/+18
| | | | Coalesce TLB shootdowns of global PTEs in pmap_advise() on x86.
* MFC r306350:kib2016-10-031-1/+29
| | | | | | For machines which support PCID but not have INVPCID instruction, i.e. SandyBridge and IvyBridge, correct a race between pmap_activate() and invltlb_pcid_handler().
* MFC r305213,305319,305398alc2016-10-011-1/+23
| | | | | | | | | | | | As an optimization to the machine-independent layer, change the machine- dependent pmap_ts_referenced() so that it updates the page's dirty field if a modified bit is found while counting reference bits. This opportunistic update can be performed at low cost and can eliminate the need for some future calls to pmap_is_modified() by the machine- independent layer. Replace the number 4 in sparc64's pmap_ts_referenced() by PMAP_TS_REFERENCED_MAX, like we've done elsewhere, e.g., amd64.
* MFC r306087:kib2016-09-281-15/+25
| | | | | Export the pmap_cache_bits() and pmap_pinit_pml4() functions from the amd64 pmap.
* MFC r306020:kib2016-09-271-29/+0
| | | | Move pmap_p*e_index() inline functions from pmap.c to pmap.h.
* MFC r302783:badger2016-08-181-1/+1
| | | | | | | | | | | | | | Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Sponsored by: Dell Inc. Approved by: vangyzen (mentor)
* MFC r303958:kib2016-08-141-3/+3
| | | | | The pmap_delayed_invl_wait() function blocks on turnstile, it does not spin, in the committed version. Remove stray '*' in the text.
* Do not access pv_table array for fictitious pages, since the arraykib2016-06-131-15/+10
| | | | | | | | | | | | | | | | does not cover the dynamically registered ficititious ranges, and fictitious pages mappings are not promoted. Offer a dummy struct md_page to fetch constant superpage pv list generation to satisfy logic. Also, by initializing the pv_dummy pv_list to empty, we can remove several explicit PG_FICTITIOUS tests. Reported and tested by: Michael Butler <imb@protected-networks.net> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6728 Approved by: re (hrs)
* Avoid spurious EINVAL in amd64 pmap_change_attr().kib2016-06-051-4/+7
| | | | | | | | | | | | Do not try to change attributes for DMAP when working on a mapping which is not covered by the DMAP. This was reported on real system where a BAR of a device (NTB) was mapped outside the PCI window. Reported and tested by: mav Reviewed by: jhb, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6668
* In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/Dkib2016-05-271-1/+1
| | | | | | | emulation. Assert that syscalls do not leak DI. Reported by: gjb Sponsored by: The FreeBSD Foundation
* Both Clang and GCC cannot generate efficient reserve_pv_entries().jkim2016-05-251-16/+15
| | | | | | | | | | | http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407 Re-implement it entirely in inline assembly not to let compilers do silly spilling to memory. For non-POPCNT case, use newly added bit_count(3). Reported by: alc Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D6541
* Document POPCNT erratum for 6th Generation Intel Core processors.jkim2016-05-231-0/+1
|
* Eliminate pvh_global_lock from the amd64 pmap.kib2016-05-141-124/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only current purpose of the pvh lock was explained there On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote: > Let me lay out one example for you in detail. Suppose that we have > three processors and two of these processors are actively using the same > pmap. Now, one of the two processors sharing the pmap performs a > pmap_remove(). Suppose that one of the removed mappings is to a > physical page P. Moreover, suppose that the other processor sharing > that pmap has this mapping cached with write access in its TLB. Here's > where the trouble might begin. As you might expect, the processor > performing the pmap_remove() will acquire the fine-grained lock on the > PV list for page P before destroying the mapping to page P. Moreover, > this processor will ensure that the vm_page's dirty field is updated > before releasing that PV list lock. However, the TLB shootdown for this > mapping may not be initiated until after the PV list lock is released. > The processor performing the pmap_remove() is not problematic, because > the code being executed by that processor won't presume that the mapping > is destroyed until the TLB shootdown has completed and pmap_remove() has > returned. However, the other processor sharing the pmap could be > problematic. Specifically, suppose that the third processor is > executing the page daemon and concurrently trying to reclaim page P. > This processor performs a pmap_remove_all() on page P in preparation for > reclaiming the page. At this instant, the PV list for page P may > already be empty but our second processor still has a stale TLB entry > mapping page P. So, changes might still occur to the page after the > page daemon believes that all mappings have been destroyed. (If the PV > entry had still existed, then the pmap lock would have ensured that the > TLB shootdown completed before the pmap_remove_all() finished.) Note, > however, the page daemon will know that the page is dirty. It can't > possibly mistake a dirty page for a clean one. However, without the > current pvh global locking, I don't think anything is stopping the page > daemon from starting the laundering process before the TLB shootdown has > completed. > > I believe that a similar example could be constructed with a clean page > P' and a stale read-only TLB entry. In this case, the page P' could be > "cached" in the cache/free queues and recycled before the stale TLB > entry is flushed. TLBs for addresses with updated PTEs are always flushed before pmap lock is unlocked. On the other hand, amd64 pmap code does not always flushes TLBs before PV list locks are unlocked, if previously PTEs were cleared and PV entries removed. To handle the situations where a thread might notice empty PV list but third thread still having access to the page due to TLB invalidation not finished yet, introduce delayed invalidation. Comparing with the pvh_global_lock, DI does not block entered thread when pmap_remove_all() or pmap_remove_write() (callers of pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait() callers are blocked until all previously noted DI blocks are leaved, thus ensuring that neccessary TLB invalidations were performed before returning from pmap_remove_all() or pmap_remove_write(). See comments for detailed description of the mechanism, and also for the explanations why several pmap methods, most important pmap_enter(), do not need DI protection. Reviewed by: alc, jhb (turnstile KPI usage) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5747
* Eliminate an unused #include. For a brief period of time, _unrhdr.h wasalc2016-05-131-1/+0
| | | | | | used to implement PCID support on amd64. Reviewed by: kib
* Explain why pmap_copy(), pmap_enter_pde(), and pmap_enter_quick_locked()alc2016-05-041-1/+21
| | | | | | | | | call pmap_invalidate_page() even though they are not destroying a leaf- level page table entry. Eliminate some bogus white-space characters in a comment. Reviewed by: kib
* AMD64 pmap: Use howmany() macrocem2016-04-241-1/+1
| | | | | | Use param.h howmany() instead of hand-rolled version. Sponsored by: EMC / Isilon Storage Division
* Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.pfg2016-04-221-1/+1
|
* sys: use our roundup2/rounddown2() macros when param.h is available.pfg2016-04-211-1/+1
| | | | | | | | | | rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.
* Remove dead code when the target processor has POPCNT instruction.jkim2016-01-131-1/+4
|
* pmap_invalidate_range: For very large ranges, flush the whole TLBcem2015-12-061-0/+8
| | | | | | | | | | | | | | | | | Typical TLBs have 40-512 entries available. At some point, iterating every single page in a requested invalidation range and issuing invlpg on it is more expensive than flushing the TLB and allowing it to reload on demand. Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary number 4096 entries as a hueristic at which point we flush TLB rather than invalidating every single potential page. Reviewed by: alc Feedback from: jhb, kib MFC notes: Depends on r291688 Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4280
* For amd64 non-PCID machines, and for i386 machines with support forkib2015-12-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | the PG_G global pte flag, pmap_invalidate_all() fails to flush global TLB entries [*]. This is because TLB shootdown handler for such configs reloads CR3, and on i386 pmap_invalidate_all() does the same for the initiating CPU. Note that current code does not issue total invalidation requests for the kernel_pmap. Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not specific for PCID for quite some time, and implement the same functionality for i386. Use the function instead of invltlb() in shootdown handlers and in i386 pmap_invalidate_all(), but only for the kernel pmap (which maps pages with the PG_G attribute set), which takes care of PG_G TLB entries on flush. To detect the affected pmap in i386 TLB shootdown handler, pmap should be passed to the smp_masked_invltlb() function, which makes amd64 and i386 TLB shootdown code almost identical. Merge the code under x86/. Noted by: jhb [*] Reviewed by: cem, jhb, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4346
* pmap_change_attr: Only fixup DMAP for DMAPed rangescem2015-10-291-4/+7
| | | | | | | | | | | | | | | | pmap_change_attr must change the memory type of both the requested KVA and the corresponding DMAP mappings (if such mappings exist), to satisfy an Intel requirement that two or more mappings to the same physical pages must have the same memory type. However, not all kernel mapped pages have corresponding DMAP mappings -- for example, 64-bit BARs. Skip fixing up the DMAP for out-of-bounds addresses. Submitted by: Steve Wahl <steve_wahl@dell.com> Reviewed by: alc, jhb Sponsored by: Dell Compellent Differential Revision: https://reviews.freebsd.org/D4030
* Intel SDM before revision 56 described the CLFLUSH instruction as onlykib2015-10-241-8/+28
| | | | | | | | | | | | | | | | | | | | | | ordered with the MFENCE instruction. Similar weak guarantees are also specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced CLFLUSH loop with MFENCE both before and after the loop. In the revision 56 of SDM, Intel stated that all existing implementations of CLFLUSH are strict, CLFLUSH instructions execution is ordered WRT other CLFLUSH and writes. Also, the strict behaviour is made architectural. A new instruction CLFLUSHOPT (which was documented for some time in the Instruction Set Extensions Programming Reference) provides the weak behaviour which was previously attributed to CLFLUSH. Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do not execute MFENCE before and after the flushing loop. Reviewed by: alc Sponsored by: The FreeBSD Foundation
* Exploit r288122 to address a cosmetic issue. Since PV chunk pages don'talc2015-09-261-1/+1
| | | | | | | | | | | belong to a vm object, they can't be paged out. Since they can't be paged out, they are never enqueued in a paging queue. Nonetheless, passing PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages are being enqueued in the inactive queue. As of r288122, we can avoid this false impression by passing PQ_NONE. Submitted by: kmacy (an earlier version) Differential Revision: https://reviews.freebsd.org/D1674
* Add 24 more page table pages we allocate on boot-up. 16MB slopmarcel2015-08-181-1/+7
| | | | | | | | | | is a little tight in and by itself, but severily insufficient when one needs to map a large frame buffer as part of console initialization. 64MB slop should be enough for a while. As an example: a 15" MacBook Pro with retina display needs ~28MB of KVA for the frame buffer. PR: 193745
* XEN/amd64 may initiate i/o over the pages not mapped by the directkib2015-08-171-2/+24
| | | | | | | | | | | map. Handle busdma bouncing and ata PIO accesses by using global frame used by the current CPU locally for the duration of pmap_quick_enter/remove_page(). A spin mutex protects the concurent frame use and prevents thread migration. Noted by: royger Reviewed by: alc, jah, royger (previous version) Sponsored by: The FreeBSD Foundation
* Better support memory mapped console devices, such as VGA and EFImarcel2015-08-121-15/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | frame buffers and memory mapped UARTs. 1. Delay calling cninit() until after pmap_bootstrap(). This makes sure we have PMAP initialized enough to add translations. Keep kdb_init() after cninit() so that we have console when we need to break into the debugger on boot. 2. Unfortunately, the ATPIC code had be moved as well so as to avoid a spurious trap #30. The reason for which is not known at this time. 3. In pmap_mapdev_attr(), when we need to map a device prior to the VM system being initialized, use virtual_avail as the KVA to map the device at. In particular, avoid using the direct map on amd64 because we can't demote by virtue of not being able to allocate yet. Keep track of the translation. Re-use the translation after the VM has been initialized to not waste KVA and to satisfy the assumption in uart(4) that the handle returned for the low-level console is the same as later returned when the device is probed and attached. 4. In pmap_unmapdev() remove the mapping from the table when called pre-init. Otherwise keep the mapping. During bus probe and attach device resources are mapped and unmapped multiple times, which would have us destroy the mapping used by the low-level console. 5. In pmap_init(), set pmap_initialized to signal that we're not pre-init anymore. On amd64, bring the direct map in sync with the translations created at that time. 6. Implement bus_space_map() and bus_space_unmap() for real: when the tag corresponds to memory space, call the corresponding pmap_mapdev() and pmap_unmapdev() functions to construct and actual handle. 7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply call pmap_mapdev_attr() or bus_space_map() as desired. Notes: 1. uart(4) already used bus_space_map() during low-level console setup but since serial ports have traditionally been I/O port based, the lack of a proper implementation for said function was not a problem. It has always supported memory mapped UARTs for low-level consoles by setting hw.uart.console accordingly. 2. The use of the direct map on amd64 without setting caching attributes has been a bigger problem than previously thought. This change has the fortunate (and unexpected) side-effect of fixing various EFI frame buffer problems (though not all). PR: 191564, 194952 Special thanks to: 1. XipLink, Inc -- generously donated an Intel Bay Trail E3800 based eval board (ADLE3800PC). 2. The FreeBSD Foundation, in particular emaste@ -- for UEFI support in general and testing. 3. Everyone who tested the proposed for PR 191564. 4. jhb@ and kib@ for being a soundboard and applying a clue bat if so needed.
* Add two new pmap functions:jah2015-08-041-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | vm_offset_t pmap_quick_enter_page(vm_page_t m) void pmap_quick_remove_page(vm_offset_t kva) These will create and destroy a temporary, CPU-local KVA mapping of a specified page. Guarantees: --Will not sleep and will not fail. --Safe to call under a non-sleepable lock or from an ithread Restrictions: --Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms --Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page. --MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm. NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing. Reviewed by: kib Approved by: kib (mentor) Differential Revision: http://reviews.freebsd.org/D3013
* Account for superpage mappings that are created by pmap_copy().alc2015-06-091-0/+1
|
* Remove several write-only variables, all reported by the gcc 4.9kib2015-05-291-4/+2
| | | | | | | | | | | | | | | | buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Enabled rewritten PCID support by default.kib2015-05-271-1/+1
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 month
* On amd64, make proc0 pmap initialization slightly more correct. Inkib2015-05-151-1/+3
| | | | | | | | | | | | particular, switch to the proc0 pmap to have expected %cr3 and PCID for the thread0 during initialization, and the up to date pm_active mask. pmap_pinit0() should be done after proc0->p_vmspace is assigned so that the amd64 pmap_activate() find the correct curproc pmap. Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Implement the support for PCID in UP kernels.kib2015-05-151-41/+55
| | | | | | | Requested by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
OpenPOWER on IntegriCloud