summaryrefslogtreecommitdiffstats
path: root/sys/amd64/amd64/mp_machdep.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+1
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* With retirement of cpumask_t and usage of cpuset_t for representing aattilio2011-07-041-46/+25
| | | | | | | | | | | | | | | mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient. Remove them and replace their usage with custom pc_cpuid magic (as, atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))). This change is not targeted for MFC because of struct pcpu members removal and dependency by cpumask_t retirement. MD review by: marcel, marius, alc Tested by: pluknet MD testing by: marcel, marius, gonzo, andreast
* remove code for dynamic offlining/onlining of CPUs on x86avg2011-06-081-165/+6
| | | | | | | | | | | | | | | | | | | | | | | The code has definitely been broken for SCHED_ULE, which is a default scheduler. It may have been broken for SCHED_4BSD in more subtle ways, e.g. with manually configured CPU affinities and for interrupt devilery purposes. We still provide a way to disable individual CPUs or all hyperthreading "twin" CPUs before SMP startup. See the UPDATING entry for details. Interaction between building CPU topology and disabling CPUs still remains fuzzy: topology is first built using all availble CPUs and then the disabled CPUs should be "subtracted" from it. That doesn't work well if the resulting topology becomes non-uniform. This work is done in cooperation with Attilio Rao who in addition to reviewing also provided parts of code. PR: kern/145385 Discussed with: gcooper, ambrisko, mdf, sbruno Reviewed by: attilio Tested by: pho, pluknet X-MFC after: never
* MFCattilio2011-06-061-2/+5
|\
| * don't use cpuid level 4 in x86 cpu topology detection if it's not supportedavg2011-06-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | This regression was introduced in r213323. There are probably no Intel cpus that support amd64 mode, but do not support cpuid level 4, but it's better to keep i386 and amd64 versions of this code in sync. Discovered by: pho Tested by: pho MFC after: 2 weeks
| * prepare code that does topology detection for amd cpus for bulldozeravg2011-05-061-2/+25
| | | | | | | | | | | | | | | | | | This also introduces a new detection path for family 10h and newer pre-bulldozer cpus, pre-10h hardware should not be affected. Tested by: Gary Jennejohn <gljennjohn@googlemail.com> (with pre-10h hardware) MFC after: 2 weeks
* | MFCattilio2011-05-061-2/+25
| |
* | Commit the support for removing cpumask_t and replacing it directly withattilio2011-05-051-103/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno
* | Revert md_assert_preempt() introduction.attilio2011-05-041-16/+0
| | | | | | | | Discussed with: jeff, jhb
* | Add the function md_assert_nopreempt(), which is a very consistentattilio2011-04-301-0/+16
|/ | | | | | | function on the possibility of a thread to not preempt. As this function is very tied to x86 (interrupts disabled checkings) it is not intended to be used in MI code.
* Amd64 doesn't have a lazypmap ipi.alc2011-03-271-3/+0
|
* Fix up a few more sysctl(9) mis-typing found in various LINT builds.mdf2011-01-131-9/+9
|
* Reinitialize PAT MSR via pmap_init_pat() while resuming. This function doesjkim2010-11-231-0/+1
| | | | better job since r215703 and it is safer now.
* A few small style and whitespace fixes.jhb2010-11-081-2/+2
|
* x86 topo_probe: do not probe smp topology if only one cpu is visibleavg2010-11-041-5/+8
| | | | | | | | | | | | | | | | This could lead to a division by zero if hardware is multi-core and/or multi-threaded, but for some (quite unusual) reason FreeBSD sees only one logical processor. This could happen, for example, if neither MADT nor MP Table are presented by BIOS. Also: - assert in topo_probe_0x4 that BSP is accounted for - neither cpu_cores nor cpu_logical should be zero after successful probing, so either being zero is an indication of failed probing Reported by: vwe, Dan Allen <danallen46@airwired.net> Tested by: Dan Allen <danallen46@airwired.net> MFC after: 3 days
* Move <machine/apicreg.h> to <x86/apicreg.h>.jhb2010-11-011-1/+1
|
* Move the <machine/mca.h> header to <x86/mca.h>.jhb2010-11-011-1/+1
|
* i386 and amd64 mp_machdep: improve topology detection for Intel CPUsavg2010-10-011-86/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is significantly based on previous work by jkim. List of changes: - added comments that describe topology uniformity assumption - added reference to Intel Processor Topology Enumeration article - documented a few global variables that describe topology - retired weirdly set and used logical_cpus variable - changed fallback code for mp_ncpus > 0 case, so that CPUs are treated as being different packages rather than cores in a single package - moved AMD-specific code to topo_probe_amd [jkim] - in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT and core masks and match APIC IDs against those masks [started by jkim] - in topo_probe_0x4() drop code for double-checking topology parameters by looking at L1 cache properties [jkim] - in topo_probe_0xb() add fallback path to topo_probe_0x4() as prescribed by Intel [jkim] Still to do: - prepare for upcoming AMD CPUs by using new mechanism of uniform topology description [pointed by jkim] - probe cache topology in addition to CPU topology and probably use that for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have the same CPU topology, but Athlon cores do not share L2 cache while Core2's do (no L3 cache in both cases) - think of supporting non-uniform topologies if they are ever implemented for platforms in question - think how to better described old HTT vs new HTT distinction, HTT vs SMT can be confusing as SMT is a generic term - more robust code for marking CPUs as "logical" and/or "hyperthreaded", use HTT mask instead of modulo operation - correct support for halting logical and/or hyperthreaded CPUs, let scheduler know that it shouldn't schedule any threads on those CPUs PR: kern/145385 (related) In collaboration with: jkim Tested by: Sergey Kandaurov <pluknet@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Chip Camden <sterling@camdensoftware.com>, Steve Wills <steve@mouf.net>, Olivier Smedts <olivier@gid0.org>, Florian Smeets <flo@smeets.im> MFC after: 1 month
* Refactor timer management code with priority to one-shot operation mode.mav2010-09-131-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.
* Revert part of the r211149 as I erroneously ported the logical_cpus fromattilio2010-08-191-3/+3
| | | | | | | | | Yahoo! patchset as a mask (and according manipulating variables) while it is actually a CPU count. Submitted by: neel MFC after: 1 month X-MFC: 211149
* Reset switchtime to zero rather than the current CPU ticker (TSC) value.jkim2010-08-131-1/+1
| | | | | | It is more appropriate in this context because TSC MSR is reset to zero when the CPU is restarted from S3 and above. Move acpi_resync_clock() back to where it was before r211202. It does not make a difference any more.
* Revert r211176:attilio2010-08-121-9/+1
| | | | | | | | As long as interrupts are disabled and there is not explicit call to sched_add() there can't be any preemption there, thus the calls may be consistent. Reported by: kib, jhb
* Reset switchtime and switchticks after resynchronizing the system clock.jkim2010-08-121-0/+3
| | | | | This should fix weird runtime problem after resume on amd64. It also fixes "calcru: runtime went backwards" warnings with bootverbose.
* Update various places that store or manipulate CPU masks to use cpumask_tjhb2010-08-111-4/+6
| | | | | instead of int or u_int. Since cpumask_t is currently u_int on all platforms this should just be a cosmetic change.
* IPI handlers may run generally with interrupts disabled because theyattilio2010-08-111-1/+9
| | | | | | | | | | | are served via an interrupt gate. However, that doesn't explicitly prevent preemption and thread migration thus scheduler pinning may be necessary in some handlers. Fix that. Tested by: gianni MFC after: 1 month
* Fix a typo due to a stale version of the patch.attilio2010-08-101-1/+1
| | | | | | Reported by: gianni, rdivacky MFC after: 1 month X-MFC: 211149
* Fix some places that may use cpumask_t while they still use 'int' types.attilio2010-08-101-9/+17
| | | | | | | | | | | | While there, also fix some places assuming cpu type is 'int' while u_int is really meant. Note: this will also fix some possible races in per-cpu data accessings to be addressed in further commits. In collabouration with: Yahoo! Incorporated (via sbruno and peter) Tested by: gianni MFC after: 1 month
* Simplify the logic for handling ipi_selected() and ipi_cpu() in theattilio2010-08-091-42/+26
| | | | | | | | | amd64/i386 case. Reviewed by: jhb Tested by: gianni MFC after: 1 month X-MFC: 210939
* Add a new ipi_cpu() function to the MI IPI API that can be used to send anjhb2010-08-061-3/+39
| | | | | | | | | | | | IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that constructed a mask for a single CPU with calls to ipi_cpu() instead. This will matter more in the future when we transition from cpumask_t to cpuset_t for CPU masks in which case building a CPU mask is more expensive. Submitted by: peter, sbruno Reviewed by: rookie Obtained from: Yahoo! (x86) MFC after: 1 month
* - Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]jkim2010-08-021-2/+2
| | | | | | | | | | | | | | savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus, saving additional information does not hurt and it may be even beneficial. Unfortunately, struct pcb has grown larger to accommodate more data. Move 512-byte long pcb_user_save to the end of struct pcb while I am here. - savectx() now saves FPU state unconditionally and copy it to the PCB of FPU thread if necessary. This gives panic dump and kdb a chance to take a look at the current FPU state even if the FPU is "supposedly" not used. - Resuming CPU now unconditionally reinitializes FPU. If the saved FPU state was irrelevant, it could be in an unknown state. Suggested by: bde [1]
* Re-implement FPU suspend/resume for amd64. This removes superfluous usesjkim2010-07-261-5/+2
| | | | | | | of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs(). Also, we do not touch PCB flags any more. MFC after: 1 month
* Some style fixes for r209371.mav2010-06-221-1/+3
| | | | Submitted by: jhb@
* Implement new event timers infrastructure. It provides unified APIs formav2010-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.
* Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.mav2010-06-171-8/+129
| | | | | | | | | This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)
* Restore the machine check register banks on resume. For banks beingjhb2010-06-151-0/+1
| | | | | | | monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks
* Fix ACPI suspend/resume on amd64, which was broken since r208833.jkim2010-06-141-1/+1
| | | | We need actual storage for FPU state to save and restore.
* Introduce the x86 kernel interfaces to allow kernel code to usekib2010-06-051-1/+1
| | | | | | | | | | | | | | | | FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month
* - Implement MI helper functions, dividing one or two timer interrupts withmav2010-05-241-3/+0
| | | | | | | | arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
* Eliminate unused declarations.alc2010-01-101-6/+0
|
* Tweak memory allocation for amd64 suspend/resume CPU context.jkim2009-11-041-3/+3
|
* * Completely Remove the option STOP_NMI from the kernel. This optionattilio2009-08-131-69/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | has proven to have a good effect when entering KDB by using a NMI, but it completely violates all the good rules about interrupts disabled while holding a spinlock in other occasions. This can be the cause of deadlocks on events where a normal IPI_STOP is expected. * Adds an new IPI called IPI_STOP_HARD on all the supported architectures. This IPI is responsible for sending a stop message among CPUs using a privileged channel when disponible. In other cases it just does match a normal IPI_STOP. Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64 architectures, while on the other has a normal IPI_STOP effect. It is responsibility of maintainers to eventually implement an hard stop when necessary and possible. * Use the new IPI facility in order to implement a new userend SMP kernel function called stop_cpus_hard(). That is specular to stop_cpu() but it does use the privileged channel for the stopping facility. * Let KDB use the newly introduced function stop_cpus_hard() and leave stop_cpus() for all the other cases * Disable interrupts on CPU0 when starting the process of APs suspension. * Style cleanup and comments adding This patch should fix the reboot/shutdown deadlocks many users are constantly reporting on mailing lists. Please don't forget to update your config file with the STOP_NMI option removal Reviewed by: jhb Tested by: pho, bz, rink Approved by: re (kib)
* Implement a facility for dynamic per-cpu variables.jeff2009-06-231-1/+4
| | | | | | | | | | | | | | | - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_*. Requires only one extra instruction more than PCPU_* and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
* FreeBSD right now support 32 CPUs on all the architectures at least.attilio2009-05-141-10/+10
| | | | | | | | | | | | | | | | With the arrival of 128+ cores it is necessary to handle more than that. One of the first thing to change is the support for cpumask_t that needs to handle more than 32 bits masking (which happens now). Some places, however, still assume that cpumask_t is a 32 bits mask. Fix that situation by using always correctly cpumask_t when needed. While here, remove the part under STOP_NMI for the Xen support as it is broken in any case. Additively make ipi_nmi_pending as static. Reviewed by: jhb, kmacy Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Implement simple machine check support for amd64 and i386.jhb2009-05-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - For CPUs that only support MCE (the machine check exception) but not MCA (i.e. Pentium), all this does is print out the value of the machine check registers and then panic when a machine check exception occurs. - For CPUs that support MCA (the machine check architecture), the support is a bit more involved. - First, there is limited support for decoding the CPU-independent MCA error codes in the kernel, and the kernel uses this to output a short description of any machine check events that occur. - When a machine check exception occurs, all of the MCx banks on the current CPU are scanned and any events are reported to the console before panic'ing. - To catch events for correctable errors, a periodic timer kicks off a task which scans the MCx banks on all CPUs. The frequency of these checks is controlled via the "hw.mca.interval" sysctl. - Userland can request an immediate scan of the MCx banks by writing a non-zero value to "hw.mca.force_scan". - If any correctable events are encountered, the appropriate details are stored in a 'struct mca_record' (defined in <machine/mca.h>). The "hw.mca.count" is a count of such records and each record may be queried via the "hw.mca.records" tree by specifying the record index (0 .. count - 1) as the next name in the MIB similar to using PIDs with the kern.proc.* sysctls. The idea is to export machine check events to userland for more detailed processing. - The periodic timer and hw.mca sysctls are only present if the CPU supports MCA. Discussed with: emaste (briefly) MFC after: 1 month
* Add support for using i8254 and rtc timers as event sources for amd64 SMPmav2009-05-021-0/+10
| | | | system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.
* - Fix divide-by-zero panic when SMP kernel is used on UP system[1].jkim2009-04-301-1/+7
| | | | | | | - Avoid possible divide-by-zero panic on SMP system when the CPUID is disabled, unsupported, or buggy. Submitted by: pluknet (pluknet at gmail dot com)[1]
* - Add support for cpuid leaf 0xb. This allows us to determine thejeff2009-04-291-53/+148
| | | | | | | | topology of nehalem/corei7 based systems. - Remove the cpu_cores/cpu_logical detection from identcpu. - Describe the layout of the system in cpu_mp_announce(). Sponsored by: Nokia
* Adjust the way we number CPUs on x86 so that we attempt to "group" alljhb2009-04-221-21/+29
| | | | | | | | | | | logical CPUs in a package. We do this by numbering the non-boot CPUs by starting with the first CPU whose APIC ID is after the boot CPU and wrapping back around to APIC ID 0 if needed rather than always starting at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list CPUs based on the mapping established by assign_cpu_ids() rather than making assumptions about the algorithm assign_cpu_ids() uses. MFC after: 1 month
* Save and restore segment registers on amd64 when entering and leavingkib2009-04-011-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors. Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2). Implement support for TSS i/o port access permission bitmap for amd64. Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason. Implement amd64-specific compat shims for sysarch. Linuxolator (temporary ?) switched to use gsbase for thread_area pointer. TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process. In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin
* Initial suspend/resume support for amd64.jkim2009-03-171-0/+40
| | | | | | This code is heavily inspired by Takanori Watanabe's experimental SMP patch for i386 and large portion was shamelessly cut and pasted from Peter Wemm's AP boot code.
OpenPOWER on IntegriCloud