op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86: add cpu codenames for Kconfig.cpu	Oliver Pinter	2007-10-17	2	-12/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add cpu core name for arch/i386/Kconfig.cpu:Pentium 4 sections help add Pentium D for arch/i386/Kconfig.cpu add Pentium D for arch/x86_64/Kconfig AK: Clarified some of the descriptions [ tglx: arch/x86 adaptation ] Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: Clean up duplicate includes in arch/i386/xen/	Jesper Juhl	2007-10-17	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up duplicate includes in arch/i386/xen/ [ tglx: arch/x86 adaptation ] Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: make some variables static	Adrian Bunk	2007-10-17	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	This patch makes some needlessly global variables static. [ tglx: arch/x86 adaptation ] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: hide cond_syscall behind __KERNEL__	Mike Frysinger	2007-10-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	This brings x86_64 into line with all other architectures by only defining cond_syscall() when __KERNEL__ is defined. [ tglx: arch/x86 adaptation ] Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: make struct apic_probe static	Adrian Bunk	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This patch makes the needlessly global struct apic_probe static. [ tglx: arch/x86 adaptation ] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: 0 -> NULL, for arch/x86_64	Yoann Padioleau	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When comparing a pointer, it's clearer to compare it to NULL than to 0. [ tglx: arch/x86 adaptation ] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Signed-off-by: Andi Kleen <ak@suse.de> Cc: ak@suse.de Cc: discuss@x86-64.org Cc: akpm@linux-foundation.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: Clean up duplicate includes in arch/i386/kernel/	Jesper Juhl	2007-10-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	This patch cleans up duplicate includes in arch/i386/kernel/ [ tglx: arch/x86 adaptation ] Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: remove rogue default m in drivers/video/Kconfig	Andi Kleen	2007-10-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove rogue default m in drivers/video/Kconfig default m is near always wrong, like here. For some reason ACPI likes to reintroduce these and I like to immediately squash them again before they pollute too many .configs. Cc: len.brown@intel.com Cc: luming.yu@gmail.com Acked-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: intel_cacheinfo misc section annotation fixes	Satyam Sharma	2007-10-17	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cache_shared_cpu_map_setup() and cache_remove_shared_cpu_map() are functions called from another function that is __cpuinit. But the !CONFIG_SMP empty-body stubs of these functions are unconditionally marked __init, which is actively wrong, and will lead to oops. But we never saw this oops, because they always managed to get inlined in their callsites, by virtue of being empty-body stubs! They should still be __cpuinit, of course. assocs[], levels[] and types[] are only referenced from function that is __cpuinit. So these are candidates for being marked __cpuinitdata. [akpm@linux-foundation.org: build fix] Signed-off-by: Satyam Sharma <satyam@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: use dev_to_node() to get node for device in dma_alloc_pages()	Yinghai Lu	2007-10-17	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	use dev_to_node() to get node for device in dma_alloc_pages(). Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: pci use pci=bfsort for HP DL385 G2 and DL585 G2	Michal Schmidt	2007-10-17	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HP ProLiant systems DL385 G2 and DL585 G2 need pci=bfsort to enumerate PCI devices in the expected order. Matt sayeth: biosdevname is a userspace app I wrote to help solve this so we don't need to patch the kernel for future systems. It's not integrated into any distributions properly yet, but is included in openSUSE 10.3 and Fedora 8 for people who want to download and install it there. It acts as a udev helper. For the time being, patching the kernel is necessary. I really hope biosdevname eliminates that need in future distributions. http://linux.dell.com/biosdevname/ Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Cc: <john.cagle@hp.com> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: Calgary: fix disable busnum for CalIOC2	Muli Ben-Yehuda	2007-10-17	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old check we used based on dev->bus->number is wrong for devices on CalIOC2. Instead look whether we have an IOMMU table for that bus - if not, translation is disabled. Thanks to Murillo Fernandes Bernardes <bernarde@br.ibm.com> for spotting, suggesting a fix and testing. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Acked-by: Murillo Fernandes Bernardes <bernarde@br.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: NX bit handling in change_page_attr()	Huang, Ying	2007-10-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a bug of change_page_attr/change_page_attr_addr on Intel x86_64 CPUs. After changing page attribute to be executable with these functions, the page remains un-executable on Intel x86_64 CPU. Because on Intel x86_64 CPU, only if the "NX" bits of all four level page tables are cleared, the corresponding page is executable (refer to section 4.13.2 of Intel 64 and IA-32 Architectures Software Developer's Manual). So, the bug is fixed through clearing the "NX" bit of PMD when splitting the huge PMD. Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: voyager don't try to support uniprocessor builds	James Bottomley	2007-10-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	A while ago Randy Dunlap and Adrian Bunk suggested we simply prevent UP voyager building. I resisted this on the grounds that the nagging was the only thing that was going to cause me to look at this. However, now I think we should probably take this course. Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: stop nmi softlockup warnings in show_mem()	Prarit Bhargava	2007-10-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	When dumping memory via sysrq-m it is possible to take a bogus NMI watchdog or softlockup watchdog because the dump can take a long time on big memory systems. Occasionally tickle the watchdog when doing the dump. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	x86: fix CONFIG_PAGEALLOC related boot hangs/OOMs	Ingo Molnar	2007-10-17	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if CONFIG_PAGEALLOC is enabled then X86_FEATURE_PSE is disabled and all the kernel physical RAM pagetables are set up as 4K pages. This is needed so that CONFIG_PAGEALLOC can do finegrained mapping and unmapping of pages. as a side-effect though, the total size of memory allocated as kernel pagetables increases significantly. All these pagetables are allocated via alloc_bootmem_low_pages(), straight out of the lowmem DMA pool. If the system has enough RAM and a large kernel image then almost all of the 16 MB lowmem DMA pool is allocated to the image and to pagetables - leaving no space for __GFP_DMA allocations. this results in drivers failing and the bootup hanging: swapper invoked oom-killer: gfp_mask=0x80d1, order=0, oomkilladj=0 [<4015059f>] out_of_memory+0x17f/0x1c0 [<40151f3c>] __alloc_pages+0x37c/0x3a0 [<40168cd7>] slob_new_page+0x37/0x50 [<40168dff>] slob_alloc+0x10f/0x190 [<40169010>] __kmalloc_node+0x80/0x90 [<405a17e3>] scsi_host_alloc+0x33/0x2c0 [<405a1a82>] scsi_register+0x12/0x60 [<40d5889e>] aha1542_detect+0x9e/0x940 [<405c5ba5>] ultrastor_detect+0x265/0x5f0 [<401352f5>] getnstimeofday+0x35/0xf0 [<40d58751>] init_this_scsi_driver+0x41/0xf0 [<40d0b856>] kernel_init+0x136/0x310 [<40d58710>] init_this_scsi_driver+0x0/0xf0 [<40d0b720>] kernel_init+0x0/0x310 [<40105547>] kernel_thread_helper+0x7/0x10 ======================= the fix is to first allocate from above the DMA pool, and if that fails (for example due to it being a machine with less than 16 MB of RAM), allocate from the DMA pool as a fallback. With this fix applied i was able to boot a PAGEALLOC=y kernel that would hang before. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: prepare page allocator for high allocations on PAGEALLOC=y	Ingo Molnar	2007-10-17	1	-0/+9
\| \| \| \| \| \| \| \| \|	To preserve the DMA pool in CONFIG_DEBUG_PAGEALLOC=y kernels, we'll allocate pagetables from above the 16MB DMA limit, so we'll have to set up boot pagetables to cover 16MB more RAM (worst-case). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: do not crash on non-Geode PCs in TSC probe	Ingo Molnar	2007-10-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with this fix Geode kernels can be booted (and QA-ed) on generic PCs. otherwise it crashes and burns during early bootup: Detected 2160.212 MHz processor. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c09071f6>] Not tainted VLI EFLAGS: 00010002 (2.6.23-rc9 #90) EIP is at tsc_init+0xa6/0x150 eax: 00000001 ebx: c1dce000 ecx: 00001900 edx: 00000001 esi: 00051000 edi: 00051000 ebp: c08fdfc4 esp: c08fdfa4 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=c08fc000 task=c082a180 task.ti=c08fc000) Stack: c076b870 00000870 000000d4 0000001d c0831e80 c1dce000 00051000 00051000 c08fdfcc c09053f8 c08fdff8 c09045ff 000001e2 c09040a0 00051000 00000020 0004e500 c0932140 00020800 00099800 c08ed000 01409007 00000000 Call Trace: [<c010517a>] show_trace_log_lvl+0x1a/0x30 [<c0105246>] show_stack_log_lvl+0xb6/0x100 [<c0105732>] show_registers+0x212/0x3a0 [<c0105aa4>] die+0x104/0x220 [<c0105f5f>] do_general_protection+0x1ef/0x2b0 [<c06699f2>] error_code+0x72/0x78 [<c09053f8>] time_init+0x8/0x20 [<c09045ff>] start_kernel+0x1af/0x320 [<00000000>] 0x0 ======================= Code: 31 d2 b8 00 00 09 3d f7 35 2c 70 9b c0 a3 04 95 8f c0 e8 ce 4e 99 ff b8 e0 45 93 c0 e8 94 b1 c5 ff e8 7f 3d 80 ff b9 00 19 00 00 <0f> 32 f6 c4 01 74 07 83 25 24 ce 82 c0 fd 8b 0d 20 ce 82 c0 b8 EIP: [<c09071f6>] tsc_init+0xa6/0x150 SS:ESP 0068:c08fdfa4 Kernel panic - not syncing: Attempted to kill the idle task! Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: enable NMI watchdog on nosmp	Ingo Molnar	2007-10-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	if nosmp has been passed as a boot option, but nmi_watchdog=2 has also been enabled then keep minimal local APIC functionality around to make the watchdog work. this allowed me to debug a hard hang that would only occur with a nosmp bootup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86_64: Fix compat emulation of PTRACE_GET/SET_THREAD_AREA	Andi Kleen	2007-10-17	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Since the 64bit kernel has different indexes for this TLS segments the address needs to be adjusted in the ptrace 32bit emulation. [ tglx: arch/x86 adaptation ] Reported-by: Amnon Shiloh Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: call free_init_pages() with irqs enabled in alternative_instructions()	Fengguang Wu	2007-10-17	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In alternative_instructions(), call free_init_pages() with irqs enabled. It fixes the warning message in smp_call_function*(), which should not be called with irqs disabled. [ 0.310000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) [ 0.310000] CPU: L2 Cache: 512K (64 bytes/line) [ 0.310000] CPU 0/0 -> Node 0 [ 0.310000] SMP alternatives: switching to UP code [ 0.310000] Freeing SMP alternatives: 25k freed [ 0.310000] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask() [ 0.310000] [ 0.310000] Call Trace: [ 0.310000] [<ffffffff8100dbde>] dump_trace+0x3ee/0x4a0 [ 0.310000] [<ffffffff8100dcd3>] show_trace+0x43/0x70 [ 0.310000] [<ffffffff8100dd15>] dump_stack+0x15/0x20 [ 0.310000] [<ffffffff8101cd44>] smp_call_function_mask+0x94/0xa0 [ 0.310000] [<ffffffff8101d0b2>] smp_call_function+0x32/0x40 [ 0.310000] [<ffffffff8104277f>] on_each_cpu+0x1f/0x50 [ 0.310000] [<ffffffff81026eac>] global_flush_tlb+0x8c/0x110 [ 0.310000] [<ffffffff81025c85>] free_init_pages+0xe5/0xf0 [ 0.310000] [<ffffffff81549b5e>] alternative_instructions+0x7e/0x150 [ 0.310000] [<ffffffff8154a2ea>] check_bugs+0x1a/0x20 [ 0.310000] [<ffffffff81540c4a>] start_kernel+0x2da/0x380 [ 0.310000] [<ffffffff81540132>] _sinittext+0x132/0x140 [ 0.310000] [ 0.320000] ACPI: Core revision 20070126 [ 0.560000] Using local APIC timer interrupts. [ 0.590000] Detected 62.496 MHz APIC timer. [ 0.590000] Brought up 1 CPUs [ tglx: arch/x86 adaptation ] Cc: Laurent Vivier <Laurent.Vivier@bull.net> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: mark read_crX() asm code as volatile	Kirill Korotaev	2007-10-17	2	-5/+5
\| \| \| \| \| \| \| \| \|	Some gcc versions (I checked at least 4.1.1 from RHEL5 & 4.1.2 from gentoo) can generate incorrect code with read_crX()/write_crX() functions mix up, due to cached results of read_crX(). The small app for x8664 below compiled with -O2 demonstrates this (i686 does the same thing):
*	x86: initialize 64bit registers for a.out executables	Andi Kleen	2007-10-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Previously the data from before the exec was kept in there. Zero them instead. [ tglx: arch/x86 adaptation ] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: return correct error code from child_rip in x86_64 entry.S	Andrey Mirkin	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now register edi is just cleared before calling do_exit. That is wrong because correct return value will be ignored. Value from rax should be copied to rdi instead of clearing edi. AK: changed to 32bit move because it's strictly an int [ tglx: arch/x86 adaptation ] Signed-off-by: Andrey Mirkin <major@openvz.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: avoid temporarily inconsistent pte-s	Jan Beulich	2007-10-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One more of these issues (which were considered fixed a few releases back): other than on x86-64, i386 allows set_fixmap() to replace already present mappings. Consequently, on PAE, care must be taken to not update the high half of a pte while the low half is still holding the old value. [ tglx: arch/x86 adaptation ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> arch/x86/mm/pgtable_32.c \| 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
*	i386: fix section mismatch warning in intel.c	Sam Ravnborg	2007-10-17	2	-16/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix following section mismatch warning: WARNING: vmlinux.o(.text+0xc88c): Section mismatch: reference to .init.text:trap_init_f00f_bug (between 'init_intel' and 'cpuid4_cache_lookup') init_intel are __cpuint where trap_init_f00f_bug is __init. Fixed by declaring trap_init_f00f_bug __cpuinit. Moved the defintion of trap_init_f00f_bug to the sole user in init.c so the ugly prototype in intel.c could get killed. Frank van Maarseveen <frankvm@frankvm.com> supplied the .config used to reproduce the warning. [ tglx: arch/x86 adaptation ] Cc: Frank van Maarseveen <frankvm@frankvm.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: Fix section mismatch	Satyam Sharma	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix bugzilla #8679 WARNING: arch/i386/kernel/built-in.o(.data+0x2148): Section mismatch: reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mtrr_mutex') comes because struct notifier_block thermal_throttle_cpu_notifier in arch/i386/kernel/cpu/mcheck/therm_throt.c goes in .data section but the notifier callback function itself has been marked __cpuinit which becomes __init == .init.text when HOTPLUG_CPU=n. The warning is bogus because the callback will never be called out if HOTPLUG_CPU=n in the first place (as one can see from kernel/cpu.c, the cpu_chain itself is __cpuinitdata :-) So, let's mark thermal_throttle_cpu_notifier as __cpuinitdata to fix the section mismatch warning. [ tglx: arch/x86 adaptation ] Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: fix 4 bit apicid assumption of mach-default	Siddha, Suresh B	2007-10-17	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix get_apic_id() in mach-default, so that it uses 8 bits incase of xAPIC case and 4 bits for legacy APIC case. This fixes the i386 kernel assumption that apic id is less than 16 for xAPIC platforms with 8 cpus or less and makes the kernel boot on such platforms. [ tglx: arch/x86 adaptation ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: fix off-by-one in find_next_zero_string	Andrew Hastings	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Fix an off-by-one error in find_next_zero_string which prevents allocating the last bit. [ tglx: arch/x86 adaptation ] Signed-off-by: Andrew Hastings <abh@cray.com> on behalf of Cray Inc. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: export i386 smp_call_function_mask() to modules	Laurent Vivier	2007-10-17	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch export i386 smp_call_function_mask() with EXPORT_SYMBOL(). This function is needed by KVM to call a function on a set of CPUs. [ tglx: arch/x86 adaptation ] Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: Install unstripped copy of 64bit vdso to disk	Roland McGrath	2007-10-17	2	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This keeps an unstripped copy of the 64bit vDSO images built before they are stripped and embedded in the kernel. The unstripped copies get installed in $(MODLIB)/vdso/ by "make install" (or you can explicitly use the subtarget "make vdso_install"). These files can be useful when they contain source-level debugging information. [ tglx: arch/x86 adaptation ] Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86_64: install unstripped copies of compat vdso on disk	Roland McGrath	2007-10-17	2	-5/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This keeps an unstripped copy of the vDSO images built before they are stripped and embedded in the kernel. The unstripped copies get installed in $(MODLIB)/vdso/ by "make install" (or you can explicitly use the subtarget "make vdso_install"). These files can be useful when they contain source-level debugging information. [ tglx: arch/x86 adaptation ] Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: setup_trampoline() must be __cpuinit	Adrian Bunk	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	WARNING: arch/i386/kernel/built-in.o(.text+0xf201): Section mismatch: reference to .init.data:trampoline_end (between 'setup_trampoline' and 'cpu_coregroup_map') WARNING: arch/i386/kernel/built-in.o(.text+0xf207): Section mismatch: reference to .init.data:trampoline_data (between 'setup_trampoline' and 'cpu_coregroup_map') WARNING: arch/i386/kernel/built-in.o(.text+0xf21a): Section mismatch: reference to .init.data:trampoline_data (between 'setup_trampoline' and 'cpu_coregroup_map') Harmless but annoying warnings present when building an i386 SMP kernel with CONFIG_HOTPLUG_CPU=n and gcc < 4.0 . [ tglx: arch/x86 adaptation ] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: clean up apicid_to_node declaration	Andrew Morton	2007-10-17	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the correct #define in the declaration of apicid_to_node[], to match the definition. [ tglx: arch/x86 adaptation ] Cc: Andi Kleen <ak@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	i386: make Oprofile call shutdown() only once per session	Stephane Eranian	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Oprofile: call model->shutdown() only once to avoid calling release_ev*() multiple times [ tglx: arch/x86 adaptation ] Signed-off-by: Stephane Eranian <eranian@hpl.hp.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: C1E late detection fix. Really switch off lapic timer	Thomas Gleixner	2007-10-17	2	-17/+4
\| \| \| \| \| \| \| \| \| \| \|	Doh, I completely missed that devices marked DUMMY are not running the set_mode function. So we force broadcasting, but we keep the local APIC timer running. Let the clock event layer mark the device _after_ switching it off. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched	Linus Torvalds	2007-10-17	4	-20/+49
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: fix new task startup crash sched: fix !SYSFS build breakage sched: fix improper load balance across sched domain sched: more robust sd-sysctl entry freeing
\| *	sched: fix new task startup crash	Srivatsa Vaddagiri	2007-10-17	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Child task may be added on a different cpu that the one on which parent is running. In which case, task_new_fair() should check whether the new born task's parent entity should be added as well on the cfs_rq. Patch below fixes the problem in task_new_fair. This could fix the put_prev_task_fair() crashes reported. Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reported-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched: fix !SYSFS build breakage	Dhaval Giani	2007-10-17	2	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_SYSFS is not set, CONFIG_FAIR_USER_SCHED fails to build with kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1488): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1490): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1480): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1494): undefined reference to `kernel_subsys' This patch fixes this build error. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched: fix improper load balance across sched domain	Ken Chen	2007-10-17	1	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We recently discovered a nasty performance bug in the kernel CPU load balancer where we were hit by 50% performance regression. When tasks are assigned to a subset of CPUs that span across sched_domains (either ccNUMA node or the new multi-core domain) via cpu affinity, kernel fails to perform proper load balance at these domains, due to several logic in find_busiest_group() miss identified busiest sched group within a given domain. This leads to inadequate load balance and causes 50% performance hit. To give you a concrete example, on a dual-core, 2 socket numa system, there are 4 logical cpu, organized as: CPU0 attaching sched-domain: domain 0: span 0003 groups: 0001 0002 domain 1: span 000f groups: 0003 000c CPU1 attaching sched-domain: domain 0: span 0003 groups: 0002 0001 domain 1: span 000f groups: 0003 000c CPU2 attaching sched-domain: domain 0: span 000c groups: 0004 0008 domain 1: span 000f groups: 000c 0003 CPU3 attaching sched-domain: domain 0: span 000c groups: 0008 0004 domain 1: span 000f groups: 000c 0003 If I run 2 tasks with CPU affinity set to 0x5. There are situation where cpu0 has run queue length of 2, and cpu2 will be idle. The kernel load balancer is unable to balance out these two tasks over cpu0 and cpu2 due to at least three logics in find_busiest_group() that heavily bias load balance towards power saving mode. e.g. while determining "busiest" variable, kernel only set it when "sum_nr_running > group_capacity". This test is flawed that "sum_nr_running" is not necessary same as sum-tasks-allowed-to-run-within-the sched-group. The end result is that kernel "think" everything is balanced, but in reality we have an imbalance and thus causing one CPU to be over-subscribed and leaving other idle. There are two other logic in the same function will also causing similar effect. The nastiness of this bug is that kernel not be able to get unstuck in this unfortunate broken state. From what we've seen in our environment, kernel will stuck in imbalanced state for extended period of time and it is also very easy for the kernel to stuck into that state (it's pretty much 100% reproducible for us). So proposing the following fix: add addition logic in find_busiest_group to detect intrinsic imbalance within the busiest group. When such condition is detected, load balance goes into spread mode instead of default grouping mode. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched: more robust sd-sysctl entry freeing	Milton Miller	2007-10-17	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It occurred to me this morning that the procname field was dynamically allocated and needed to be freed. I started to put in break statements when allocation failed but it was approaching 50% error handling code. I came up with this alternative of looping while entry->mode is set and checking proc_handler instead of ->table. Alternatively, the string version of the domain name and cpu number could be stored the structs. I verified by compiling CONFIG_DEBUG_SLAB and checking the allocation counts after taking a cpuset exclusive and back. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block	Linus Torvalds	2007-10-17	10	-47/+62
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [SCSI] Remove full sg table memset() [SCSI] ide-scsi: remove usage of sg_last() Fix loop terminating conditions in fill_sg(). [BLOCK] Clear sg entry before filling in blk_rq_map_sg() IA64: iommu uses sg_next with an invalid sg element cciss: disable DMA refetch on Smart Array P600 swiotlb: fix map_sg failure handling SPARC64: fix iommu sg chaining [SCSI] ide-scsi: use scsi_sg_count() instead of ->use_sg
\| * \|	[SCSI] Remove full sg table memset()	FUJITA Tomonori	2007-10-17	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't need to do that anymore, since blk_rq_map_sg() clears individual entries. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	[SCSI] ide-scsi: remove usage of sg_last()	Jens Axboe	2007-10-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to remove sg_last(), it's a very expensive interface. So keep track of number of sg entries in the sg list, instead of comparing with the last entry. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	Fix loop terminating conditions in fill_sg().	David S. Miller	2007-10-17	2	-12/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	[BLOCK] Clear sg entry before filling in blk_rq_map_sg()	Jens Axboe	2007-10-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The memset() of the sg entry was originally removed, because it could overwrite a chain pointer. But it's quite OK to memset() it when we know it's a valid entry, since it can't contain a chain pointer. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	IA64: iommu uses sg_next with an invalid sg element	FUJITA Tomonori	2007-10-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sg list elements might not be continuous. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	cciss: disable DMA refetch on Smart Array P600	Mike Miller (OS Dev)	2007-10-17	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch disables DMA refetch in the PCI bridge. We have disabled DMA prefetch for quite some time. Testing with XEN revealed another ASIC bug. If dom0 resides on a P600 the board can can an MCA bi accessing invalid memory addresses. Apparently, we need to disable both prefetch and refetch. My understanding is a refetch operation should not occur but it is a valid thing to do if prefetched data is no longer available for whatever reason. Please consider this patch for inclusion. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> -------------------------------------------------------------------------------- Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	swiotlb: fix map_sg failure handling	FUJITA Tomonori	2007-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sg list elements might not be continuous. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
\| * \|	SPARC64: fix iommu sg chaining	FUJITA Tomonori	2007-10-17	4	-23/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2c941a204070ab32d92d40318a3196a7fb994c00 looks incomplete. The helper functions like prepare_sg() need to support sg chaining too. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>