summaryrefslogtreecommitdiffstats
path: root/sys/x86
Commit message (Collapse)AuthorAgeFilesLines
* Fix multiple vulnerabilities of OpenSSL. [SA-17:02]delphij2017-02-231-0/+12
| | | | | | | | | | | | | | Fix system hang when booting when PCI-express HotPlug is enabled. [EN-17:01] Fix NIS master updates are not pushed to NIS slave. [EN-17:02] Fix compatibility with Hyper-V/storage after KB3172614 or KB3179574. [EN-17:03] Make makewhatis output reproducible. [EN-17:04] Approved by: so
* MFC r303712:kib2016-08-101-0/+210
| | | | | | Merge i386 and amd64 variants of mp_watchdog.c into x86/. Approved by: re (gjb)
* MFC r303490, r303491:royger2016-08-032-24/+10
| | | | | | | xen-intr: fix removal of event channels during resume Revert r291022: x86/intr: allow mutex recursion in intr_remove_handler Approved by: re (kib)
* MFC r302635:royger2016-07-152-5/+34
| | | | | | xen: automatically disable MSI-X interrupt migration Approved by: re (kib)
* Add a tunable to disable migration of MSI-X interrupts.jhb2016-06-241-0/+14
| | | | | | | | | | | | | | | | | | The new 'machdep.disable_msix_migration' tunable can be set to 1 to disable migration of MSI-X interrupts. Xen versions prior to 4.6.0 do not properly handle updates to MSI-X table entries after the initial write. In particular, the operation to unmask a table entry after updating it during migration is not propagated to the "real" table for passthrough devices causing the interrupt to remain masked. At least some systems in EC2 are affected by this bug when using SRIOV. The tunable can be set in loader.conf as a workaround. Submitted by: Jeremiah Lott <jlott@averesystems.com> (original patch) Approved by: re (marius) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6947
* Use M_NOWAIT when allocating memory for the ACPI wakeup handler.markj2016-06-231-1/+1
| | | | | | | | | | | If the allocation attempt fails, we may otherwise VM_WAIT after a failed attempt to reclaim contiguous memory in the requested range. After r297466, this results in the thread going to sleep, causing a hang during boot. Reviewed by: jkim, kib Approved by: re (gjb) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6945
* Trim some spaces to record correct commit message for the r301278.kib2016-06-031-2/+2
| | | | | | | | | | | | | Reduce number of iterations used for calibrating ICR read loop. The new number of iteration still gives the same ICR latency as before, tested on Intel SandyBridge and Haswell machines, and on AMD. But it significantly reduces the unneeded pause on boot in some VMs, from ~10 secs to less then 1 sec. It was reported to occur in bhyve on AMD host. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week
* diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.ckib2016-06-031-1/+1
| | | | | | | | | | | | | | | index d8bda77..bb15df0 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -511,7 +511,7 @@ native_lapic_init(vm_paddr_t addr) } #ifdef SMP -#define LOOPS 1000000 +#define LOOPS 100000 /* * Calibrate the busy loop waiting for IPI ack in xAPIC mode. * lapic_ipi_wait_mult contains the number of iterations which
* Implement _ALIGN() using internal integer types.ed2016-05-311-2/+2
| | | | | | | The existing version depends on register_t and uintptr_t, which are only available when including headers such as <sys/types.h>. As this macro is used by <sys/socket.h>, for example, it should be written in such a way that it doesn't depend on those types.
* Add missing dependency on <machine/_limits.h>.ed2016-05-311-2/+4
| | | | | | | | | | | In r227474, this header file was changed to define SIG_ATOMIC_{MIN,MAX} in terms of LONG_{MIN,MAX}. Unlike all of the definitions in this header file, LONG_{MIN,MAX} is provided by <limits.h>. Remove the dependency on <limits.h> by using __LONG_{MIN,MAX} instead and including <machine/_limits.h>. This change is needed to make SIG_ATOMIC_{MIN,MAX} work without including any other header files.
* Add missing dependency on <machine/_limits.h>.ed2016-05-311-0/+2
| | | | | | This header uses __INT_MIN and __INT_MAX, which is provided by <machine/_limits.h>. This is needed to make <stdint.h>'s WCHAR_MIN and WCHAR_MAX work without including other headers as well.
* hyperv/vmbus: Rename ISR functionssephe2016-05-311-1/+0
| | | | | | MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601
* Only calibrate ICR read loop when not in x2APIC mode. Run-timekib2016-05-261-13/+15
| | | | | | | | | | | switching between LAPIC modes is not supported, and there is no need to wait for IPI ack in x2APIC mode. So the calibrated delay is only needed for !x2APIC. This saves around a second of boot time on the real hardware for x2APIC. Sponsored by: The FreeBSD Foundation
* Implement support for RF_UNMAPPED and bus_map/unmap_resource on x86.jhb2016-05-201-30/+114
| | | | | | | | | | Add implementations of bus_map/unmap_resource to the x86 nexus driver. Change bus_activate/deactivate_resource to honor RF_UNMAPPED and to use bus_map/unmap_resource to create/destroy the implicit mapping when RF_UNMAPPED is not set. Reviewed by: cem Differential Revision: https://reviews.freebsd.org/D5237
* Add an EARLY_AP_STARTUP option to start APs earlier during boot.jhb2016-05-145-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Application Processors (non-boot CPUs) are started by MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until SI_SUB_SMP at which point they are released to run kernel threads. SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter the scheduler and start running threads until fairly late in the boot. This change moves SI_SUB_SMP up to just before software interrupt threads are created allowing the APs to start executing kernel threads much sooner (before any devices are probed). This allows several initialization routines that need to perform initialization on all CPUs to now perform that initialization in one step rather than having to defer the AP initialization to a second SYSINIT run at SI_SUB_SMP. It also permits all CPUs to be available for handling interrupts before any devices are probed. This last feature fixes a problem on with interrupt vector exhaustion. Specifically, in the old model all device interrupts were routed onto the boot CPU during boot. Later after the APs were released at SI_SUB_SMP, interrupts were redistributed across all CPUs. However, several drivers for multiqueue hardware allocate N interrupts per CPU in the system. In a system with many CPUs, just a few drivers doing this could exhaust the available pool of interrupt vectors on the boot CPU as each driver was allocating N * mp_ncpu vectors on the boot CPU. Now, drivers will allocate interrupts on their desired CPUs during boot meaning that only N interrupts are allocated from the boot CPU instead of N * mp_ncpu. Some other bits of code can also be simplified as smp_started is now true much earlier and will now always be true for these bits of code. This removes the need to treat the single-CPU boot environment as a special case. As a transition aid, the new behavior is available under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I plan to enable this on x86 by default in a followup commit in the next few days and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. These changes have only been tested on x86. Other platform maintainers are encouraged to port their architectures over as well. The main things to check for are any uses of smp_started in MD code that can be simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in the EARLY_AP_STARTUP case (e.g. the interrupt shuffling). PR: kern/199321 Reviewed by: markj, gnn, kib Sponsored by: Netflix
* Remove the extra _RD as _RDTUN already includes it.bz2016-05-131-1/+1
| | | | | Submitted by: emaste MFC after: 2 weeks
* We already turn the AMD erratum383 workaround on for certain VM_GUEST_VMbz2016-05-131-1/+2
| | | | | | | | | | | | | | | | | if specific CPU features are not present. Some simulation environments, e.g. gem5, have been found to require more TLB management from the kernel in certain setups. It is currently unclear why. Turning on the workaround_erratum383 seems to help and make problems (panics) go away. Given this is a fairly uncommon environment so far, allowing the workaround to be manually enabled from loader in order to make debugging and comparing traces easier, but also to allow gem5 run FreeBSD in X86 timing mode, seems to be the least intrusive option for now until the issue if fully understood. Sponsored by: DARPA/AFRL Reviewed by: kib, alc (earlier) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6206
* Allow orm(4) to be disabled from probing/attaching by a hints entry:bz2016-05-101-0/+3
| | | | | | | | | hint.orm.0.disabled=1 Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6307
* Remove misc NULL checks after M_WAITOK allocations.trasz2016-05-101-2/+0
| | | | | MFC after: 1 month Sponsored by: The FreeBSD Foundation
* Add a new bus method to fetch device-specific CPU sets.jhb2016-05-092-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Compared to the r298933, this version uses 'struct _cpuset' in <sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h> (<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though <sys/_bitset.h> does not after recent changes).
* Work around (ignore) broken SRAT tablesvangyzen2016-05-031-2/+6
| | | | | | | | | | | | | | Instead of panicking when parsing an invalid ACPI SRAT table, just ignore it, effectively disabling NUMA. https://lists.freebsd.org/pipermail/freebsd-current/2016-May/060984.html Reported and tested by: Bill O'Hanlon (bill.ohanlon at gmail.com) Reviewed by: jhb MFC after: 1 week Relnotes: If dmesg shows "SRAT: Duplicate local APIC ID", try updating your BIOS to fix NUMA support. Sponsored by: Dell Inc.
* Revert bus_get_cpus() for now.jhb2016-05-033-23/+2
| | | | | I really thought I had run this through the tinderbox before committing, but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
* Add a new bus method to fetch device-specific CPU sets.jhb2016-05-023-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bus_get_cpus() returns a specified set of CPUs for a device. It accepts an enum for the second parameter that indicates the type of cpuset to request. Currently two valus are supported: - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the device when DEVICE_NUMA is enabled) - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core) For systems that do not support NUMA (or if it is not enabled in the kernel config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus' by default. The idea is that INTR_CPUS should always return a valid set. Device drivers which want to use per-CPU interrupts should start using INTR_CPUS instead of simply assigning interrupts to all available CPUs. In the future we may wish to add tunables to control the policy of INTR_CPUS (e.g. should it be local-only or global, should it ignore SMT threads or not). The x86 nexus driver exposes the internal set of interrupt CPUs from the the x86 interrupt code via INTR_CPUS. The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and the global INTR_CPUS set from the nexus driver with the per-domain set from _PXM to generate a local INTR_CPUS set for child devices. Reviewed by: wblock (manpage) Differential Revision: https://reviews.freebsd.org/D5519
* atrtc: export function to set RTCroyger2016-05-021-21/+28
| | | | | | | | | | | This is going to be used by the Xen clock on Dom0 in order to set the RTC of the host. The current logic in atrtc_settime is moved to atrtc_set and the unused device_t parameter is removed from the atrtc_set function call so it can be safely used by other callers. Sponsored by: Citrix Systems R&D Reviewed by: kib, jhb Differential revision: https://reviews.freebsd.org/D6067
* sys: use our roundup2/rounddown2() macros when param.h is available.pfg2016-04-211-2/+2
| | | | | | | | | | rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.
* SRAT: Don't overflow domain_pxm tablecem2016-04-201-5/+6
| | | | | | | | | | | If we reached MAXMEMDOM, we would previously try to insert an additional element and only detect overflow after causing (probably trivial) memory overflow. Instead, detect the ndomain > MAXMEMDOM case before we write past the end. Reported by: Coverity CID: 1354783 Sponsored by: EMC / Isilon Storage Division
* X86: use our nitems() macro when it is avaliable through param.h.pfg2016-04-193-4/+4
| | | | | | No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current
* Add hw.dmar.batch_coalesce tunable/sysctl, which specifies rate atkib2016-04-173-2/+20
| | | | | | | | | | | | | which queued invalidation completion interrupt is requested with regard to the queued invalidation requests. In other words, setting the value of the knob to N requests completion interrupt after N items are processed. Existing behaviour is restored by setting hw.dmar.batch_coalesce=1. The knob significantly decreases the DMAR qi interrupt rate at the cost of slightly longer DMAR map entries recycling. Sponsored by: The FreeBSD Foundation
* Add x86 CPU features definitions published in the Intel SDM rev. 58.kib2016-04-162-2/+26
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Always calculate divisor for the counter mode of LAPIC timer. Even ifkib2016-04-151-15/+34
| | | | | | | | | | | | initially configured in the TSC deadline mode, eventtimer subsystem can be switched to periodic, and then DCR register is loaded with unitialized value. Reset the LAPIC eventtimer frequency and min/max periods when changing between deadline and counted periodic modes. Reported and tested by: Vladimir Zakharov <zakharov.vv@gmail.com> Sponsored by: The FreeBSD Foundation
* busdma/bounce: revert r292255royger2016-04-151-12/+44
| | | | | | | | | | | | Revert r292255 because it can create bounced regions without contiguous page offsets, which is needed for USB devices. Another solution would be to force bouncing the full buffer always (even when only one page requires bouncing), but this seems overly complicated and unnecessary, and it will probably involve using more bounce pages than the current code. Reported by: phk
* x86: for pointers replace 0 with NULL.pfg2016-04-141-2/+2
| | | | | | These are mostly cosmetical, no functional change. Found with devel/coccinelle.
* Deprecate using hints.acpi.0.rsdp to communicate the RSDP to theimp2016-04-141-0/+11
| | | | | | | | | | | | | | | | | system. This uses the hints mechnanism. This mostly works today because when there's no static hints (the default), this value can be fetched from the hint. When there is a static hints file, the hint passed from the boot loader to the kernel is ignored, but for the BIOS case we're able to find it anyway. However, with UEFI, the fallback doesn't work, so we get a panic instead. Switch to acpi.rsdp and use TUNABLE_ULONG_FETCH instead. Continue to generate the old values to allow for transitions. In addition, fall back to the old method if the new method isn't present. Add comments about all this. Differential Revision: https://reviews.freebsd.org/D5866
* re-enable AMD Topology extension on certain models if disabled by BIOSavg2016-04-123-13/+29
| | | | | | | | | | | | | Some BIOSes disable AMD Topology extension on AMD Family 15h notebook processors. We re-enable the extension, so that we can properly discover core and cache topology. Linux seems to do the same. Reported by: Johannes Dieterich <dieterich.joh@gmail.com> Reviewed by: jhb, kib Tested by: Johannes Dieterich <dieterich.joh@gmail.com> (earlier version) MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D5883
* Cleanup unnecessary semicolons from the kernel.pfg2016-04-101-1/+1
| | | | Found with devel/coccinelle.
* Add more fine-grained kernel options for NUMA support.jhb2016-04-091-20/+35
| | | | | | | | | | | | | VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782
* xen: Set ipi_{alloc,free} even for UPsephe2016-04-071-2/+2
| | | | | | | | | This keeps XEN apic_ops aligned w/ x86's. Suggested by: kib, jhb Reviewed by: jhb, royger Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5871
* x86: Allow interrupt vector allocation/free even on UPsephe2016-04-071-4/+4
| | | | | | | | | It is needed by the hypervisor FreeBSD guest to allocate/free private interrupt vectors. Reviewed by: kib, jhb, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5849
* x86 topo: add some comments, descriptions and references to documentationavg2016-04-051-3/+72
| | | | | | Plus a minor cosmetic change. MFC after: 1 month
* new x86 smp topology detection codeavg2016-04-041-275/+490
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the code determined a topology of processing units (hardware threads, cores, packages) and then deduced a cache topology using certain assumptions. The new code builds a topology that includes both processing units and caches using the information provided by the hardware. At the moment, the discovered full topology is used only to creeate a scheduling topology for SCHED_ULE. There is no KPI for other kernel uses. Summary: - based on APIC ID derivation rules for Intel and AMD CPUs - can handle non-uniform topologies - requires homogeneous APIC ID assignment (same bit widths for ID components) - topology for dual-node AMD CPUs may not be optimal - topology for latest AMD CPU models may not be optimal as the code is several years old - supports only thread/package/core/cache nodes Todo: - AMD dual-node processors - latest AMD processors - NUMA nodes - checking for homogeneity of the APIC ID assignment across packages - more flexible cache placement within topology - expose topology to userland, e.g., via sysctl nodes Long term todo: - KPI for CPU sharing and affinity with respect to various resources (e.g., two logical processors may share the same FPU, etc) Reviewed by: mav Tested by: mav MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D2728
* Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386.jhb2016-04-031-0/+164
|
* Style(9), use tabs for the #define LOOPS line.kib2016-04-011-5/+4
| | | | | | | | Print unsigned values with %u. Make code slightly more compact by inlining loop limit. Noted by: bde Sponsored by: The FreeBSD Foundation
* Type of the interrupt handlers on x86 cannot be expressed in C.kib2016-03-291-0/+7
| | | | | | | | Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771
* Fix several bugs in r297374:kib2016-03-291-5/+13
| | | | | | | | | | | | | | | | - fix UP build [1] - do not obliterate initial reading of rdtsc by the loop counter [2] - restore the meaning of the argument -1 to native_lapic_ipi_wait() as wait until LAPIC acknowledge without timeout - correct formula for calculating loop iteration count for 1us, it was inverted, and ensure that even on unlikely slow CPUs at least one check for ack is performed. Reported by: Michael Butler <imb@protected-networks.net> [1], rpokala[2], jhb[3] Tested by: Michael Butler Pointy hat to: kib Sponsored by: The FreeBSD Foundation
* Calibrate the frequency of the of the native_lapic_ipi_wait() loop,kib2016-03-291-15/+40
| | | | | | | | | | | | | | | | | and avoid a delay while waiting for IPI delivery acknowledgement in xAPIC mode. This makes the loop exit immediately after the delivery bit in APIC_ICR register is set, instead of waiting for some microseconds. We only need to ensure that some amount of time is allowed for the LAPIC to react to the command, and we need that the wait time is finite and reasonable. For that reasons, it is irrelevant if the CPU frequency or throttling decrease the speed and make the loop, calibrated for full CPU speed at boot time, execute somewhat slower. Discussed with: bde, jhb Tested by: pho Sponsored by: The FreeBSD Foundation
* Use ANSI function definition.kib2016-03-291-1/+1
| | | | Sponsored by: The FreeBSD Foundation
* Do not load LAPIC_DCR_TIMER with an undefined value. If we are in thekib2016-03-281-3/+6
| | | | | | | | | deadline mode the divide configuration is not used and lapic_timer_divisor is not set. Reported by: dhw, mav Tested by: mav Sponsored by: The FreeBSD Foundation
* Use TSC deadline mode for LAPIC timer, when available. The mode fireskib2016-03-281-58/+150
| | | | | | | | | | | | | | | | | | | | | LAPIC timer iinterrupt when TSC reaches the value written to the IA32_TSC_DEADLINE MSR. To arm or reset the timer in deadline mode, a single non-serializing MSR write is enough. This is an advance from the one-shot mode of LAPIC, where timer operated with the FSB frequency and required two (serialized in case of xAPIC) writes to the APIC registers. The LVT_TIMER register value is cached to avoid unneeded writes in the deadline mode. Unused arguments to specify period (which is passed in struct lapic as la_timer_period) and interrupt enable (which is always enabled) are removed from lapic_timer_{oneshot,periodic,deadline} functions. Instead, special lapic_timer_oneshot_nointr() function for interrupt-less one-shot calibration is added. Reviewed by: mav (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5738
* Add defines for the LAPIC TSC deadline timer mode. The LVT timer modekib2016-03-282-3/+6
| | | | | | | | field is two-bit, extend the mask. Also add comments about all MSRs writes to which are not serializing. Sponsored by: The FreeBSD Foundation
* Enable interrupts on the BSP once all PICs are initialized.jhb2016-03-241-0/+15
| | | | | | | | | | | | This moves the enabling of interrupts slightly earlier (the old location was still before devices were enumerated and probed) and does it in the interrupt code (rather than in the device configuration code). This also avoids tripping over an assertion on the first TLB shootdown with earlier AP startup. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5710
OpenPOWER on IntegriCloud