op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata	Muli Ben-Yehuda	2007-07-21	3	-23/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: make k8topology multi-core aware	Joachim Deguara	2007-07-21	1	-6/+7
\| \| \| \| \| \| \| \| \|	This makes k8topology multicore aware instead of limited to signle- and dual-core CPUs. It uses the CPUID to be more future proof. Signed-off-by: Joachim Deguara <joachim.deguara@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: remove __smp_alt* sections	Jan Beulich	2007-07-21	1	-9/+0
\| \| \| \| \| \| \| \| \|	Leftovers from the removal of the more general (but abandoned) SMP alternatives. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: arch/x86_64/kernel/e820.c lower printk severity	Dan Aloni	2007-07-21	1	-1/+1
\| \| \| \| \| \| \|	Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: arch/x86_64/kernel/aperture.c lower printk severity	Dan Aloni	2007-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Users that use kernel log filtering (e.g. via syslogd or a proprietry method) wouldn't like to see warning prints that are not really warnings. Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: move iommu declaration from proto to iommu.h	Yinghai Lu	2007-07-21	8	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	[akpm@linux-foundation.org: build fix] Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: disable srat when numa emulation succeeds	David Rientjes	2007-07-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When NUMA emulation succeeds, acpi_numa needs to be set to -1 so that srat_disabled() will always return true. We won't be calling acpi_scan_nodes() or registering the true nodes we've found. [hugh@veritas.com: Fix x86_64 CONFIG_NUMA_EMU build: acpi_numa needs CONFIG_ACPI_NUMA] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: fix e820_hole_size based on address ranges	David Rientjes	2007-07-21	2	-37/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	e820_hole_size() now uses the newly extracted helper function, e820_find_active_region(), to determine the size of usable RAM in a range of PFN's. This was previously broken because of two reasons: - The start and end PFN's of each e820 entry were not properly rounded prior to excluding those entries in the range, and - Entries smaller than a page were not properly excluded from being accumulated. This resulted in emulated nodes being incorrectly mapped to ranges that were completely reserved and not candidates for being registered as active ranges. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: disable the GART in shutdown	Yinghai Lu	2007-07-21	3	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM installed. when using kexec to load second kernel. In the second kernel, when mem is allocated for GART, it will do the memset for clear, it will cause restart, because some device still used that for dma. solution will be: in second kernel: disable that at first before we try to allocate mem for it. or in the first kernel: do disable that before shutdown. Andi/Eric/Alan prefer to second one for clean shutdown in first kernel. Andi also point out need to consider to AGP enable but mem less 4G case too. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: change _map_single to static in pci_gart.c etc	Yinghai Lu	2007-07-21	2	-6/+6
\| \| \| \| \| \| \| \| \|	This function is called via dma_ops->.., so change it to static Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: lower printk severity	Dan Aloni	2007-07-21	1	-1/+1
\| \| \| \| \| \| \|	Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: use the global PIT lock	Thomas Gleixner	2007-07-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Replace the pcspkr private PIT lock by the global PIT lock to serialize the PIT access all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: During VM oom condition, kill all threads in process group	Will Schmidt	2007-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a VM oom condition, kill all threads in the process group. We have had complaints where a threaded application is left in a bad state after one of it's threads is killed when we hit a VM: out_of_memory condition. Killing just one of the process threads can leave the application in a bad state, whereas killing the entire process group would allow for the application to restart, or otherwise handled, and makes it very obvious that something has gone wrong. This change allows the entire process group to be taken down, rather than just the one thread. Signed-off-by: Will <will_schmidt@vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Move functions declarations to header file	Glauber de Oliveira Costa	2007-07-21	1	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Some interrupt entry points are currently defined in i8259.c They probably belong in a header. Right now, their only user is init_IRQ, justifying their declaration in-file. But when virtualization comes in, we may be interested in using that functions in late initializations. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Calgary - fold in redundant functions	Muli Ben-Yehuda	2007-07-21	1	-21/+9
\| \| \| \| \| \| \| \| \| \|	After the bitmap changes we can get rid of the unlocked versions of calgary_unmap_sg and iommu_free. Fold __calgary_unmap_sg and __iommu_free into their calgary_unmap_sg and iommu_free, respectively. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Calgary - change _map_single, etc to static	Yinghai Lu	2007-07-21	1	-4/+4
\| \| \| \| \| \| \| \| \|	there function are called via dma_ops->.., so change them to static Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Calgary - tighten up the bitmap locking	Muli Ben-Yehuda	2007-07-21	1	-23/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Currently the IOMMU table's lock protects both the bitmap and access to the hardware's TCE table. Access to the TCE table is synchronized through the bitmap; therefore, only hold the lock while modifying the bitmap. This gives a yummy 10-15% reduction in CPU utilization for netperf on a large SMP machine. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Calgary - fix few style problems pointed out by checkpatch.pl	Muli Ben-Yehuda	2007-07-21	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	No actual code was harmed in the production of this patch. Thanks to Andrew Morton <akpm@linux-foundation.org> for telling me about checkpatch.pl. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: tidy up debug printks	Muli Ben-Yehuda	2007-07-21	1	-10/+3
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: only reserve the first 1MB of IO space for CalIOC2	Muli Ben-Yehuda	2007-07-21	1	-2/+2
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: tabify and trim trailing whitespace	Muli Ben-Yehuda	2007-07-21	1	-20/+20
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: cleanup of unneeded macros	Guillaume Thouvenin	2007-07-21	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \|	Cleanup unneeded macros used for register space address calculation. Now we are using the EBDA to find the space address. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: reserve TCEs with the same address as MEM regions	Muli Ben-Yehuda	2007-07-21	1	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \|	This works around a bug where DMAs that have the same addresses as some MEM regions do not go through. Not clear yet if this is due to a mis-configuration or something deeper. [akpm@linux-foundation.org: coding style fixlet] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: grab PLSSR too when a DMA error occurs	Muli Ben-Yehuda	2007-07-21	1	-3/+6
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: make dump_error_regs a chip op	Muli Ben-Yehuda	2007-07-21	1	-8/+34
\| \| \| \| \| \| \| \| \| \| \|	Provide seperate versions for Calgary and CalIOC2 Also print out the PCIe Root Complex Status on CalIOC2 errors Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: implement CalIOC2 TCE cache flush sequence	Muli Ben-Yehuda	2007-07-21	1	-1/+87
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: add chip_ops and a quirk function for CalIOC2	Muli Ben-Yehuda	2007-07-21	1	-4/+33
\| \| \| \| \| \| \| \|	[akpm@linux-foundation.org>: make calioc2_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: introduce CalIOC2 support	Muli Ben-Yehuda	2007-07-21	1	-40/+156
\| \| \| \| \| \| \| \| \| \| \| \| \|	CalIOC2 is a PCI-e implementation of the Calgary logic. Most of the programming details are the same, but some differ, e.g., TCE cache flush. This patch introduces CalIOC2 support - detection and various support routines. It's not expected to work yet (but will with follow-on patches). Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: abstract how we find the iommu_table for a device	Muli Ben-Yehuda	2007-07-21	1	-7/+14
\| \| \| \| \| \| \| \| \| \|	... in preparation for doing it differently for CalIOC2. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: introduce chipset specific ops	Muli Ben-Yehuda	2007-07-21	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \|	Calgary and CalIOC2 share most of the same logic. Introduce struct cal_chipset_ops for quirks and tce flush logic which are [akpm@linux-foundation.org: make calgary_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: introduce handle_quirks() for various chipset quirks	Muli Ben-Yehuda	2007-07-21	1	-8/+17
\| \| \| \| \| \| \| \| \|	Move the aic94xx split completion timeout handling there. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: update copyright notice	Muli Ben-Yehuda	2007-07-21	1	-1/+1
\| \| \| \| \| \| \|	Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: generalize calgary_increase_split_completion_timeout	Muli Ben-Yehuda	2007-07-21	1	-4/+5
\| \| \| \| \| \| \| \| \|	... will be used by CalIOC2 later Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: check remote IRR bit before migrating level triggered irq	Eric W. Biederman	2007-07-21	1	-2/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On x86_64 kernel, level triggered irq migration gets initiated in the context of that interrupt(after executing the irq handler) and following steps are followed to do the irq migration. 1. mask IOAPIC RTE entry; // write to IOAPIC RTE 2. EOI; // processor EOI write 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and // and interrupt vector due to per cpu vector // allocation. 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE Because of the per cpu vector allocation in x86_64 kernels, when the irq migrates to a different cpu, new vector(corresponding to the new cpu) will get allocated. An EOI write to local APIC has a side effect of generating an EOI write for level trigger interrupts (normally this is a broadcast to all IOAPICs). The EOI broadcast generated as a side effect of EOI write to processor may be delayed while the other IOAPIC writes (step 3 and 4) can go through. Normally, the EOI generated by local APIC for level trigger interrupt contains vector number. The IOAPIC will take this vector number and search the IOAPIC RTE entries for an entry with matching vector number and clear the remote IRR bit (indicate EOI). However, if the vector number is changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI is received later. This will cause the remote IRR to get stuck causing the interrupt hang (no more interrupt from this RTE). Current x86_64 kernel assumes that remote IRR bit is cleared by the time IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR bit and if it still set, delay the irq migration to the next interrupt arrival event(hopefully, next time remote IRR bit will get cleared before the IOAPIC RTE is reprogrammed). Initial analysis and patch from Nanhai. Clean up patch from Suresh. Rewritten to be less intrusive, and to contain a big fat comment by Eric. [akpm@linux-foundation.org: fix comments] Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nanhai Zou <nanhai.zou@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Keith Packard <keith.packard@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: round_jiffies() for i386 and x86-64 non-critical/corrected MCE polling	Venki Pallipadi	2007-07-21	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	This helps to reduce the frequency at which the CPU must be taken out of a lower-power state. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Tim Hockin <thockin@hockin.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: Make Alt-SysRq-p display the debug register contents	Alan Stern	2007-07-21	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	This patch (as921) adds code to the show_regs() routine in i386 and x86_64 to print the contents of the debug registers along with all the others. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: PM_TRACE support	Nigel Cunningham	2007-07-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: mcelog tolerant level cleanup	Tim Hockin	2007-07-21	1	-34/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: The MCE handler has several paths that it can take, depending on various conditions of the MCE status and the value of the 'tolerant' knob. The exact semantics are not well defined and the code is a bit twisty. Description: This patch makes the MCE handler's behavior more clear by documenting the behavior for various 'tolerant' levels. It also fixes or enhances several small things in the handler. Specifically: * If RIPV is set it is not safe to restart, so set the 'no way out' flag rather than the 'kill it' flag. * Don't panic() on correctable MCEs. * If the _OVER bit is set and the _UC bit is set (meaning possibly dropped uncorrected errors), set the 'no way out' flag. * Use EIPV for testing whether an app can be killed (SIGBUS) rather than RIPV. According to docs, EIPV indicates that the error is related to the IP, while RIPV simply means the IP is valid to restart from. * Don't clear the MCi_STATUS registers until after the panic() path. This leaves the status bits set after the panic() so clever BIOSes can find them (and dumb BIOSes can do nothing). This patch also calls nonseekable_open() in mce_open (as suggested by akpm). Result: Tolerant levels behave almost identically to how they always have, but not it's well defined. There's a slightly higher chance of panic()ing when multiple errors happen (a good thing, IMHO). If you take an MBE and panic(), the error status bits are not cleared. Alternatives: None. Testing: I used software to inject correctable and uncorrectable errors. With tolerant = 3, the system usually survives. With tolerant = 2, the system usually panic()s (PCC) but not always. With tolerant = 1, the system always panic()s. When the system panic()s, the BIOS is able to detect that the cause of death was an MC4. I was not able to reproduce the case of a non-PCC error in userspace, with EIPV, with (tolerant < 3). That will be rare at best. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: support poll() on /dev/mcelog	Tim Hockin	2007-07-21	3	-40/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: /dev/mcelog is typically polled manually. This is less than optimal for situations where accurate accounting of MCEs is important. Calling poll() on /dev/mcelog does not work. Description: This patch adds support for poll() to /dev/mcelog. This results in immediate wakeup of user apps whenever the poller finds MCEs. Because the exception handler can not take any locks, it can not call the wakeup itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is caught at the next return from interrupt or exit from idle, calling the mce_user_notify() routine. This patch also disables the "fake panic" path of the mce_panic(), because it results in printk()s in the exception handler and crashy systems. This patch also does some small cleanup for essentially unused variables, and moves the user notification into the body of the poller, so it is only called once per poll, rather than once per CPU. Result: Applications can now poll() on /dev/mcelog. When an error is logged (whether through the poller or through an exception) the applications are woken up promptly. This should not affect any previous behaviors. If no MCEs are being logged, there is no overhead. Alternatives: I considered simply supporting poll() through the poller and not using TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error happening and the user application being notified is themost* critical window for us. Many uncorrectable errors can be logged to the network if given a chance. I also considered doing the MCE poll directly from the idle notifier, but decided that was overkill. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that my user app is woken up in sync with the polling interval. I also used the northbridge to inject uncorrectable ECC errors, and verified (printk() to the rescue) that the notify routine is called and the user app does wake up. I built with PREEMPT on and off, and verified that my machine survives MCEs. [wli@holomorphy.com: build fix] Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: O_EXCL on /dev/mcelog	Tim Hockin	2007-07-21	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: /dev/mcelog is a clear-on-read interface. It is currently possible for multiple users to open and read() the device. Users are protected from each other during any one read, but not across reads. Description: This patch adds support for O_EXCL to /dev/mcelog. If a user opens the device with O_EXCL, no other user may open the device (EBUSY). Likewise, any user that tries to open the device with O_EXCL while another user has the device will fail (EBUSY). Result: Applications can get exclusive access to /dev/mcelog. Applications that do not care will be unchanged. Alternatives: A simpler choice would be to only allow one open() at all, regardless of O_EXCL. Testing: I wrote an application that opens /dev/mcelog with O_EXCL and observed that any other app that tried to open /dev/mcelog would fail until the exclusive app had closed the device. Caveats: None. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: fake apicid_to_node mapping for fake numa	David Rientjes	2007-07-21	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are in the emulated NUMA case, we need to make sure that all existing apicid_to_node mappings that point to real node ID's now point to the equivalent fake node ID's. If we simply iterate over all apicid_to_node[] members for each node, we risk remapping an entry if it shares a node ID with a real node. Since apicid's may not be consecutive, we're forced to create an automatic array of apicid_to_node mappings and then copy it over once we have finished remapping fake to real nodes. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: fake pxm-to-node mapping for fake numa	David Rientjes	2007-07-21	2	-3/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For NUMA emulation, our SLIT should represent the true NUMA topology of the system but our proximity domain to node ID mapping needs to reflect the emulated state. When NUMA emulation has successfully setup fake nodes on the system, a new function, acpi_fake_nodes() is called. This function determines the proximity domain (_PXM) for each true node found on the system. It then finds which emulated nodes have been allocated on this true node as determined by its starting address. The node ID to PXM mapping is changed so that each fake node ID points to the PXM of the true node that it is located on. If the machine failed to register a SLIT, then we assume there is no special requirement for emulated node affinity so we use the default LOCAL_DISTANCE, which is newly exported to this code, as our measurement if the emulated nodes appear in the same PXM. Otherwise, we use REMOTE_DISTANCE. PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we can compare node_to_pxm() results in generic code (in this case, the SRAT code). Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: extract helper function from e820_register_active_regions	David Rientjes	2007-07-21	1	-34/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic in e820_find_active_regions() for determining the true active regions for an e820 entry given a range of PFN's is needed for e820_hole_size() as well. e820_hole_size() is called from the NUMA emulation code to determine the reserved area within an address range on a per-node basis. Its logic should duplicate that of finding active regions in an e820 entry because these are the only true ranges we may register anyway. [akpm@linux-foundation.org: cleanup] Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: Quicklist support for x86_64	Christoph Lameter	2007-07-21	3	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: remove unused variable maxcpus	Jan Beulich	2007-07-21	1	-1/+0
\| \| \| \| \| \| \| \| \|	.. and adjust documentation to properly reflect options that are x86-64 specific. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: minor exception trace variables cleanup	Jan Beulich	2007-07-21	2	-3/+1
\| \| \| \| \| \|	Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: ia32entry adjustments	Jan Beulich	2007-07-21	1	-2/+3
\| \| \| \| \| \| \| \| \|	Consolidate the three 32-bit system call entry points so that they all treat registers in similar ways. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: time.c white space wreckage cleanup	Thomas Gleixner	2007-07-21	1	-45/+45
\| \| \| \| \| \| \| \|	Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: apic.c coding style janitor work	Thomas Gleixner	2007-07-21	1	-43/+30
\| \| \| \| \| \| \| \| \| \|	Fix coding style, white space wreckage and remove unused code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86_64: fiuxp pt_reqs leftovers	Thomas Gleixner	2007-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The hpet_rtc_interrupt handler still uses pt_regs. Fix it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>