| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed by: jhb
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding
This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.
Please don't forget to update your config file with the STOP_NMI
option removal
Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
|
| |
and it only optimized out an ipi or mwait in very few cases.
- Skip the adaptive idle code when running on SMT or HTT cores. This
just wastes cpu time that could be used on a busy thread on the same
core.
- Rename CG_FLAG_THREAD to CG_FLAG_SMT to be more descriptive. Re-use
CG_FLAG_THREAD to mean SMT or HTT.
Sponsored by: Nokia
|
|
|
|
|
|
| |
This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.
|
|
|
|
|
|
| |
Make the API work in the non-smp case too so that a kernel module
can work the same regardless of whether or not it is loaded on a SMP
kernel or not.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.
Sponsored by: Nokia
|
|
|
|
|
|
|
|
| |
lock optimized for almost exclusive reader access. (see also rmlock.9)
TODO:
Convert to per cpu variables linkerset as soon as it is available.
Optimize UP (single processor) case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.
KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.
Submitted by: ups
|
| |
|
|
|
|
| |
moved the variables but not the declarations.
|
|
|
|
| |
Move the sysctls into kern.sched
|
|
|
|
|
|
| |
when there is new work to be done.
MFC after: 5 days
|
|
|
|
| |
Requested by: jhb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.
This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.
Obtained from: dwhite, alc
Reviewed by: jhb
|
|
|
|
|
|
|
| |
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.
|
|
|
|
|
|
|
|
| |
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.
Approved by: re (scottl)
Tested on: i386, amd64, alpha
|
|
|
|
|
|
|
|
|
|
| |
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function. This also means that CPU_ABSENT() is now
always implemented the same on all kernels.
Approved by: re (scottl)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.
Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)
|
|
|
|
|
| |
smp_topology may be left NULL by architectures which have vanilla SMP
setups.
|
| |
|
|
|
|
| |
so that inlines in machine/smp.h can use variables declared in sys/smp.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.
Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.
Reviewed by: jake
Approved by: jake
|
|
|
|
|
|
|
| |
declaration in subr_smp.c. This solves a compile problem with
gcc 3.0.1 (ia64 cross-build).
Reviewed: jhb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
|
|
|
|
|
|
|
|
|
|
| |
defined to 0 in the non-SMP case, which very much makes sense as it
permits its usage in per-CPU initialization loops (for an example, check
out subr_mbuf.c).
Further, on a UP system, make mb_alloc always use the first per-CPU
container, regardless of cpuid (i.e. remove reliability on cpuid in the
UP case).
Requested by: alfred
|
|
|
|
|
|
|
| |
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.
Reviewed by: jake
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
been made machine independent and various other adjustments have been made
to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter
Looked over by: eivind
|
|
|
|
|
|
| |
to stay for the foreseeable future.
OK'd by: peter (the idea)
|
|
|
|
|
| |
described in the MP table until something asks for the interrupt number
later on.
|
|
|
|
|
|
| |
mpapic.c. This gives us the benefit of C type checking. These functions
are not called in any critical paths and are not used by the interrupt
routines.
|
| |
|
|
|
|
|
|
|
|
| |
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.
Reviewed by: jhb, jakeb
|
|
|
|
|
| |
with !SMP kernels. Also, replace NCPUS with MAXCPU since they are
redundant.
|
|
|
|
|
|
| |
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.
Reviewed by: peter
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).
Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
|
|
|
| |
these dynamically (ie. typically you shouldn't have to set NAPIC at all)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't allow cpu entries in the MP table to contain APIC IDs out of range.
Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.
Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:
- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.
- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.
- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.
- IDs out of range is considered to be in conflict.
During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.
PR: 20312, 18919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.
IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED
Both IOAPICs had ID 0.
Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.
The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.
Submitted by: tegge
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.
Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.
Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>
|
|
|
|
|
|
| |
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
|
|
|
|
|
|
| |
SMP (other CPUs stopped but SMP mode not really started).
Obtained from:Tor.Egge@fast.no
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.
The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.
Reviewed by: peter (partially)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.
Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
and use this when masking/unmasking interrupts.
Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.
Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.
Don't let an AP enter _cpu_switch before all local apics are initialized.
|
|
|
|
|
|
|
|
|
|
|
| |
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).
Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.
Remove some spurious forward_statclock/forward_hardclock warnings.
|