summaryrefslogtreecommitdiffstats
path: root/sys/amd64
Commit message (Collapse)AuthorAgeFilesLines
...
* MFC r305539: work around AMD erratum 793 for family 16h, models 00h-0Fhavg2016-10-271-0/+14
|
* Merge r307936:glebius2016-10-251-1/+4
| | | | | | | | | The argument validation in r296956 was not enough to close all possible overflows in sysarch(2). Submitted by: Kun Yang <kun.yang chaitin.com> Patch by: kib Security: SA-16:15
* MFC r306680:kib2016-10-242-3/+9
| | | | Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags.
* MFC r303818, r303833, r303941, r304478, r304481, r304483, r304484, r304554,ed2016-10-123-3/+236
| | | | | | | | | | | | | | | | | | r304555, r304556, r304557, r304558, r304559, r304561, r304563, r304564, r304565, r304615, r304742, r304743, r304744, r304745, r304748, r304886, r304991, r305928, r305938, r305987, r306185: Bring CloudABI support back in sync with HEAD. - Add support for running 32-bit executables on amd64, armv6 and i386. - As these new architectures require the use of the vDSO, merge back vDSO support for 64-bit executables running on amd64 and arm64 as well. This has the advantage that support for vDSO-less execution can be phased out when 11.0 becomes unsupported, as opposed to 11.x. This change has been tested by running the cloudlibc unit tests on all supported architectures, which seems to work fine.
* MFC r306097:kib2016-10-053-0/+629
| | | | | | | | | | | | | Add kernel interfaces to call EFI Runtime Services. MFC r306104: Fix build of the module outside the kernel tree. MFC r306209 (by imp): Change the efi_get_table interface to a void **. MFC r306351: Handle TLB shootdown IPI during the EFI runtime calls, on SandyBridges.
* MFC r306350:kib2016-10-031-1/+29
| | | | | | For machines which support PCID but not have INVPCID instruction, i.e. SandyBridge and IvyBridge, correct a race between pmap_activate() and invltlb_pcid_handler().
* MFC r305213,305319,305398alc2016-10-011-1/+23
| | | | | | | | | | | | As an optimization to the machine-independent layer, change the machine- dependent pmap_ts_referenced() so that it updates the page's dirty field if a modified bit is found while counting reference bits. This opportunistic update can be performed at low cost and can eliminate the need for some future calls to pmap_is_modified() by the machine- independent layer. Replace the number 4 in sparc64's pmap_ts_referenced() by PMAP_TS_REFERENCED_MAX, like we've done elsewhere, e.g., amd64.
* MFC 305502: Reset PCI pass through devices via PCI-e FLR during VM start/end.jhb2016-09-301-0/+11
| | | | | | | | | | | | Add routines to trigger a function level reset (FLR) of a PCI-express device via the PCI-express device control register. This also includes support routines to wait for pending transactions to complete as well as calculating the maximum completion timeout permitted by a device. Change the ppt(4) driver to reset pass through devices before attaching to a VM during startup and before detaching from a VM during shutdown. Sponsored by: Chelsio Communications
* MFC 304858,305485,305497: Fix various issues with PCI pass through and VT-d.jhb2016-09-304-22/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 304858: Enable I/O MMU when PCI pass through is first used. Rather than enabling the I/O MMU when the vmm module is loaded, defer initialization until the first attempt to pass a PCI device through to a guest. If the I/O MMU fails to initialize or is not present, than fail the attempt to pass a PCI device through to a guest. The hw.vmm.force_iommu tunable has been removed since the I/O MMU is no longer enabled during boot. However, the I/O MMU support can be disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent use of the I/O MMU on any systems where it is buggy. 305485: Leave ppt devices in the host domain when they are not attached to a VM. This allows a pass through device to be reset to a normal device driver on the host and reused on the host. ppt devices are now always active in some I/O MMU domain when the I/O MMU is active, either the host domain or the domain of a VM they are attached to. 305497: Update the I/O MMU in bhyve when PCI devices are added and removed. When the I/O MMU is active in bhyve, all PCI devices need valid entries in the DMAR context tables. The I/O MMU code does a single enumeration of the available PCI devices during initialization to add all existing devices to a domain representing the host. The ppt(4) driver then moves pass through devices in and out of domains for virtual machines as needed. However, when new PCI devices were added at runtime either via SR-IOV or HotPlug, the I/O MMU tables were not updated. This change adds a new set of EVENTHANDLERS that are invoked when PCI devices are added and deleted. The I/O MMU driver in bhyve installs handlers for these events which it uses to add and remove devices to the "host" domain. Sponsored by: Chelsio Communications
* MFC r306092:kib2016-09-281-2/+2
| | | | Rename efi_systbl to efi_systbl_phys.
* MFC r306091:kib2016-09-281-0/+42
| | | | | | Add a way for the architecture to specify the calling ABI for methods in the EFI Runtime Services Table. On amd64, the calling conventions are MS.
* MFC r306088:kib2016-09-281-0/+33
| | | | Add amd64 functions to load/store GDT register, store IDT and TR registers.
* MFC r306087:kib2016-09-282-15/+27
| | | | | Export the pmap_cache_bits() and pmap_pinit_pml4() functions from the amd64 pmap.
* MFC r306020:kib2016-09-273-34/+34
| | | | Move pmap_p*e_index() inline functions from pmap.c to pmap.h.
* MFC r305942:kib2016-09-251-3/+0
| | | | Consolidate four efi_next_descriptor() definitions.
* MFC r305692:kib2016-09-253-14/+59
| | | | Add FPU_KERN_NOCTX flag to the fpu_kern_enter() function on amd64.
* MFC r305939:kib2016-09-211-1/+1
| | | | Remove trailing space.
* MFC 303713: Correct assertion on vcpuid argument to vm_gpa_hold().jhb2016-09-091-1/+1
| | | | PR: 208168
* MFC 304637: Fix build for !SMP kernels after the Xen MSIX workaround.jhb2016-09-091-1/+2
| | | | | | | Move msix_disable_migration under #ifdef SMP since it doesn't make sense for !SMP kernels. PR: 212014
* MFC r302783:badger2016-08-181-1/+1
| | | | | | | | | | | | | | Add explicit detection of KVM hypervisor Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Sponsored by: Dell Inc. Approved by: vangyzen (mentor)
* MFC r303913:kib2016-08-171-2/+2
| | | | | Unconditionally perform checks that FPU region was entered, when #NM exception is caught in kernel mode.
* MFC r302835: fix-up for configuration of AMD Family 10h processorsavg2016-08-151-0/+14
| | | | borrowed from Linux
* MFC r303958:kib2016-08-141-3/+3
| | | | | The pmap_delayed_invl_wait() function blocks on turnstile, it does not spin, in the committed version. Remove stray '*' in the text.
* MFC r303583:mjg2016-08-111-10/+3
| | | | | | | | | | | | amd64: implement pagezero using rep stos The current implementation uses non-temporal writes. This turns out to be detrimental to performance if the page is used shortly after, which is the typical case with page faults. Switch to rep stos. Approved by: re (gjb)
* MFC r303712:kib2016-08-101-210/+0
| | | | | | Merge i386 and amd64 variants of mp_watchdog.c into x86/. Approved by: re (gjb)
* MFC r302517:dchagin2016-07-184-360/+7
| | | | | | | | | | | Fix a copy/paste bug introduced during X86_64 Linuxulator work. FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation use READ_IMPLIES_EXEC flag, introduced in r302515. While here move common part of mmap() and mprotect() code to the files in compat/linux to reduce code dupcliation between Linuxulator's Approved by: re (gjb)
* MFC r302516:dchagin2016-07-1810-14/+28
| | | | | | Regen for r302515 (Linux personality). Approved by: re (gjb)
* MFC r302515:dchagin2016-07-182-2/+2
| | | | | | | | | | Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit. Approved by: re (gjb)
* MFC r302635:royger2016-07-151-0/+2
| | | | | | xen: automatically disable MSI-X interrupt migration Approved by: re (kib)
* Remove GENERIC-NODEBUG kernel configurations, missed duringgjb2016-07-141-38/+0
| | | | | | | | | the stable/11 branch. This is a direct commit to stable/11. Approved by: re (kib) Sponsored by: The FreeBSD Foundation
* MFC r302448:ed2016-07-121-0/+1
| | | | | | | | | | | | | Don't forget to set sa->narg for CloudABI system calls. It turns out that this value is not used within the system call code under normal conditions, except when using tracing tools like ktrace. If we forget to set this value, it is set to random garbage. This may cause ktrace to hang indefinitely, making it impossible to kill. Approved by: re@ Reported by: Michael Plass PR: 210800
* - Remove debugging from GENERIC* kernel configurationsgjb2016-07-081-9/+0
| | | | | | | | | - Enable MALLOC_PRODUCTION - Default dumpdev=NO - Remove UPDATING entry regarding debugging features Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
* Replace a number of conflations of mp_ncpus and mp_maxid with eithernwhitehorn2016-07-061-1/+1
| | | | | | | | | | | | | | | | | | | mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try, for example, to run tasks on CPUs that did not exist or to allocate too few buffers on systems with sparse CPU IDs in which there are holes in the range and mp_maxid > mp_ncpus. Such circumstances generally occur on systems with SMT, but on which SMT is disabled. This patch restores system operation at least on POWER8 systems configured in this way. There are a number of other places in the kernel with potential problems in these situations, but where sparse CPU IDs are not currently known to occur, mostly in the ARM machine-dependent code. These will be fixed in a follow-up commit after the stable/11 branch. PR: kern/210106 Reviewed by: jhb Approved by: re (glebius)
* Update comments for the MD functions managing contexts for newkib2016-06-162-25/+15
| | | | | | | | | | | | | | | | threads, to make it less confusing and using modern kernel terms. Rename the functions to reflect current use of the functions, instead of the historic KSE conventions: cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads) cpu_set_upcall -> cpu_copy_thread (for forks) cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation) Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (hrs) Differential revision: https://reviews.freebsd.org/D6731
* Do not access pv_table array for fictitious pages, since the arraykib2016-06-131-15/+10
| | | | | | | | | | | | | | | | does not cover the dynamically registered ficititious ranges, and fictitious pages mappings are not promoted. Offer a dummy struct md_page to fetch constant superpage pv list generation to satisfy logic. Also, by initializing the pv_dummy pv_list to empty, we can remove several explicit PG_FICTITIOUS tests. Reported and tested by: Michael Butler <imb@protected-networks.net> (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6728 Approved by: re (hrs)
* Avoid spurious EINVAL in amd64 pmap_change_attr().kib2016-06-051-4/+7
| | | | | | | | | | | | Do not try to change attributes for DMAP when working on a mapping which is not covered by the DMAP. This was reported on real system where a BAR of a device (NTB) was mapped outside the PCI window. Reported and tested by: mav Reviewed by: jhb, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6668
* In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/Dkib2016-05-272-1/+5
| | | | | | | emulation. Assert that syscalls do not leak DI. Reported by: gjb Sponsored by: The FreeBSD Foundation
* Both Clang and GCC cannot generate efficient reserve_pv_entries().jkim2016-05-251-16/+15
| | | | | | | | | | | http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407 Re-implement it entirely in inline assembly not to let compilers do silly spilling to memory. For non-POPCNT case, use newly added bit_count(3). Reported by: alc Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D6541
* Document POPCNT erratum for 6th Generation Intel Core processors.jkim2016-05-231-0/+1
|
* Add macro to convert errno and use it when appropriate.dchagin2016-05-221-7/+1
| | | | MFC after: 1 week
* Regen after r300359 (struct l_sched_param removal).dchagin2016-05-2110-26/+26
| | | | MFC after: 1 week
* Correct an argument param of linux_sched_* system calls as a struct ↵dchagin2016-05-212-6/+6
| | | | | | | | l_sched_param does not defined due to it's nature. MFC after: 1 week
* Check for overflow and return EINVAL if detected. Backport this andkib2016-05-201-1/+2
| | | | | | | | | r300305 to i386. PR: 209661 Reported and reviewed by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Use unsigned type for the loop index to make overflow checks effective.kib2016-05-201-1/+2
| | | | | | | PR: 209661 Reported by: cturt Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Don't repeat the the word 'the'eadler2016-05-171-1/+1
| | | | | | | (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)
* atomic: Add testandclear on i386/amd64sephe2016-05-161-0/+38
| | | | | | Reviewed by: kib Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6381
* Eliminate pvh_global_lock from the amd64 pmap.kib2016-05-143-124/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only current purpose of the pvh lock was explained there On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote: > Let me lay out one example for you in detail. Suppose that we have > three processors and two of these processors are actively using the same > pmap. Now, one of the two processors sharing the pmap performs a > pmap_remove(). Suppose that one of the removed mappings is to a > physical page P. Moreover, suppose that the other processor sharing > that pmap has this mapping cached with write access in its TLB. Here's > where the trouble might begin. As you might expect, the processor > performing the pmap_remove() will acquire the fine-grained lock on the > PV list for page P before destroying the mapping to page P. Moreover, > this processor will ensure that the vm_page's dirty field is updated > before releasing that PV list lock. However, the TLB shootdown for this > mapping may not be initiated until after the PV list lock is released. > The processor performing the pmap_remove() is not problematic, because > the code being executed by that processor won't presume that the mapping > is destroyed until the TLB shootdown has completed and pmap_remove() has > returned. However, the other processor sharing the pmap could be > problematic. Specifically, suppose that the third processor is > executing the page daemon and concurrently trying to reclaim page P. > This processor performs a pmap_remove_all() on page P in preparation for > reclaiming the page. At this instant, the PV list for page P may > already be empty but our second processor still has a stale TLB entry > mapping page P. So, changes might still occur to the page after the > page daemon believes that all mappings have been destroyed. (If the PV > entry had still existed, then the pmap lock would have ensured that the > TLB shootdown completed before the pmap_remove_all() finished.) Note, > however, the page daemon will know that the page is dirty. It can't > possibly mistake a dirty page for a clean one. However, without the > current pvh global locking, I don't think anything is stopping the page > daemon from starting the laundering process before the TLB shootdown has > completed. > > I believe that a similar example could be constructed with a clean page > P' and a stale read-only TLB entry. In this case, the page P' could be > "cached" in the cache/free queues and recycled before the stale TLB > entry is flushed. TLBs for addresses with updated PTEs are always flushed before pmap lock is unlocked. On the other hand, amd64 pmap code does not always flushes TLBs before PV list locks are unlocked, if previously PTEs were cleared and PV entries removed. To handle the situations where a thread might notice empty PV list but third thread still having access to the page due to TLB invalidation not finished yet, introduce delayed invalidation. Comparing with the pvh_global_lock, DI does not block entered thread when pmap_remove_all() or pmap_remove_write() (callers of pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait() callers are blocked until all previously noted DI blocks are leaved, thus ensuring that neccessary TLB invalidations were performed before returning from pmap_remove_all() or pmap_remove_write(). See comments for detailed description of the mechanism, and also for the explanations why several pmap methods, most important pmap_enter(), do not need DI protection. Reviewed by: alc, jhb (turnstile KPI usage) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5747
* Eliminate an unused #include. For a brief period of time, _unrhdr.h wasalc2016-05-131-1/+0
| | | | | | used to implement PCID support on amd64. Reviewed by: kib
* Add locking annotations to amd64 struct md_page members.kib2016-05-101-2/+6
| | | | | | Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add a new bus method to fetch device-specific CPU sets.jhb2016-05-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).
OpenPOWER on IntegriCloud