op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PATCH] madvise MADV_DONTFORK/MADV_DOFORK	Michael S. Tsirkin	2006-02-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, copy-on-write may change the physical address of a page even if the user requested that the page is pinned in memory (either by mlock or by get_user_pages). This happens if the process forks meanwhile, and the parent writes to that page. As a result, the page is orphaned: in case of get_user_pages, the application will never see any data hardware DMA's into this page after the COW. In case of mlock'd memory, the parent is not getting the realtime/security benefits of mlock. In particular, this affects the Infiniband modules which do DMA from and into user pages all the time. This patch adds madvise options to control whether memory range is inherited across fork. Useful e.g. for when hardware is doing DMA from/into these pages. Could also be useful to an application wanting to speed up its forks by cutting large areas out of consideration. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] powerpc: unshare system call registration	JANAK DESAI	2006-02-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Registers system call for the powerpc architecture. Signed-off-by: Janak Desai <janak@us.ibm.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	Merge branch 'for-linus2' of ↵	Linus Torvalds	2006-02-08	2	-1/+6
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/bird
\| *	[PATCH] __user annotations in powerpc thread_info	Al Viro	2006-02-08	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	[PATCH] powerpc signal __user annotations	Al Viro	2006-02-08	1	-0/+5
\| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	[PATCH] powerpc: Thermal control for dual core G5s	Benjamin Herrenschmidt	2006-02-07	1	-0/+5
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a windfarm module, windfarm_pm112, for the dual core G5s (both 2 and 4 core models), keeping the machine from getting into vacuum-cleaner mode ;) For proper credits, the patch was initially written by Paul Mackerras, and slightly reworked by me to add overtemp handling among others. The patch also removes the sysfs attributes from windfarm_pm81 and windfarm_pm91 and instead adds code to the windfarm core to automagically expose attributes for sensor & controls. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge	Linus Torvalds	2006-02-07	1	-0/+2
\|\
\| *	[PATCH] powerpc: Don't overwrite flat device tree with kdump kernel	Michael Ellerman	2006-02-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's possible for prom_init to allocate the flat device tree inside the kdump crash kernel region. If this happens, when we load the kdump kernel we overwrite the flattened device tree, which is bad. We could make prom_init try and avoid allocating inside the crash kernel region, but then we run into issues if the crash kernel region uses all the space inside the RMO. The easiest solution is to move the flat device tree once we're running in the kernel. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* \|	[PATCH] remove bogus asm/bug.h includes.	Al Viro	2006-02-07	1	-1/+0
\|/ \| \| \| \| \| \|	A bunch of asm/bug.h includes are both not needed (since it will get pulled anyway) and bogus (since they are done too early). Removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] powerpc: fix for kexec ppc32	Albert Herranz	2006-02-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- kexec.h is included from assembly code, thus C code must be properly protected. - (embedded) ppc32 systems use machine_kexec_simple whose declaration vanished during a recent powerpc merge change. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Cc: <fastboot@osdl.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] powerpc: enable irq's for platform functions.	Ben Collins	2006-02-01	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make the platform function interrupt functions actually work. Calls irq_enable() for the first in the list, and irq_disable() for the last. Added *func to struct irq_client so the the user can pass just that to pmf_unregister_irq_client(). Signed-off-by: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] TIF_RESTORE_SIGMASK support for arch/powerpc	David Woodhouse	2006-01-18	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \|	Implement the TIF_RESTORE_SIGMASK flag in the new arch/powerpc kernel, for both 32-bit and 64-bit system call paths. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Generic sys_rt_sigsuspend()	David Woodhouse	2006-01-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of sys_rt_sigsuspend() instead of duplicating it for each architecture. This provides such an implementation and makes arch/powerpc use it. It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Fix sparse parse error in lppaca.h	Bryan O'Sullivan	2006-01-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sparse can't parse a struct definition in include/asm-powerpc/lppaca.h, even though gcc can accept it. The form looks like this: struct __attribute__((whatever)) foo { }; An equivalent that both gcc and sparse can handle is struct foo { } __attribute__((whatever)); This is the only definition of this type in the tree, and fixing it is easier than fixing sparse. Signed-off-by: Bryan O'Sullivan <bos@serpentine.com> [ Side note: fixing sparse wouldn't be hard, but the "attribute at the end" version is the canonical one, and the one that makes sense. So let's just fix the kernel instead. Luc Van Oostenryck already sent out a sparse patch to the sparse mailing list in case anybody cares. -- Linus ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] powerpc: Fix kdump copy regs and dynamic allocate per-cpu crash notes	Haren Myneni	2006-01-15	1	-12/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- This contains the arch specific changes for the following the kdump generic fixes which were already accepted in the upstream. . Capturing CPU registers (for the case of 'panic' and invoking the dump using 'sysrq-trigger') from a function (stack frame) which will be not be available during the kdump boot. Hence, might result in invalid stack trace. . Dynamically allocating per cpu ELF notes section instead of statically for NR_CPUS. - Fix the compiler warning in prom_init.c. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: oprofile cpu type names clash with other code	Andy Whitcroft	2006-01-14	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 2.6.15-git6 a change was commited in the oprofile support in the powerpc architecture. It introduced the powerpc_oprofile_type which contains the define G4. This causes a name clash with the existing wacom usb tablet driver. CC [M] drivers/usb/input/wacom.o drivers/usb/input/wacom.c:98: error: conflicting types for `G4' include/asm/cputable.h:37: error: previous declaration of `G4' CC [M] drivers/usb/mon/mon_text.o make[3]: * [drivers/usb/input/wacom.o] Error 1 make[2]: * [drivers/usb/input] Error 2 The elements of an enum declared in global scope are effectivly global identifiers themselves. As such we need to ensure the names are unique. This patch updates the later oprofile support to use unique names. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	powerpc: Provide a suitable AT_PLATFORM value	Paul Mackerras	2006-01-14	2	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The glibc folks want to use AT_PLATFORM to select between possible alternative versions of shared libraries. This commit makes the kernel supply an AT_PLATFORM string that indicates what class of processor we are running on. Processors with the same set of user-level instructions and roughly the same instruction scheduling characteristics are given the same AT_PLATFORM value; for example, 821, 823 and 860 are all reported as "ppc823", and 7447, 7447A, 7448, 7450, 7451, 7455 are all called "ppc7450". The intention is that the AT_PLATFORM values match the values that gcc accepts for the -mcpu= option. For values which are numeric (e.g. -mcpu=750), "ppc" has been prepended. This also adds a PPC_FEATURE_BOOKE bit to the AT_HWCAP value and sets it for the 440 family and the Freescale 85xx family. Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: reformat atomic_add_unless	Anton Blanchard	2006-01-13	1	-13/+13
\| \| \| \| \| \| \|	It makes my eyes hurt. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: use lwsync in atomics, bitops, lock functions	Anton Blanchard	2006-01-13	6	-46/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	eieio is only a store - store ordering. When used to order an unlock operation loads may leak out of the critical region. This is potentially buggy, one example is if a user wants to atomically read a couple of values. We can solve this with an lwsync which orders everything except store - load. I removed the (now unused) EIEIO_ON_SMP macros and the c versions isync_on_smp and eieio_on_smp now we dont use them. I also removed some old comments that were used to identify inline spinlocks in assembly, they dont make sense now our locks are out of line. Another interesting thing was that read_unlock was using an eieio even though the rest of the spinlock code had already been converted to use lwsync. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: Remove lppaca structure from the PACA	David Gibson	2006-01-13	4	-18/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At present the lppaca - the structure shared with the iSeries hypervisor and phyp - is contained within the PACA, our own low-level per-cpu structure. This doesn't have to be so, the patch below removes it, making a separate array of lppaca structures. This saves approximately 500*NR_CPUS bytes of image size and kernel memory, because we don't need aligning gap between the Linux and hypervisor portions of every PACA. On the other hand it means an extra level of dereference in many accesses to the lppaca. The patch also gets rid of several places where we assign the paca address to a local variable for no particular reason. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: Cleanup LOADADDR etc. asm macros	David Gibson	2006-01-13	1	-36/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch consolidates the variety of macros used for loading 32 or 64-bit constants in assembler (LOADADDR, LOADBASE, SET_REG_TO_*). The idea is to make the set of macros consistent across 32 and 64 bit and to make it more obvious which is the appropriate one to use in a given situation. The new macros and their semantics are described in the comments in ppc_asm.h. In the process, we change several places that were unnecessarily using immediate loads on ppc64 to use the GOT/TOC. Likewise we cleanup a couple of places where we were clumsily subtracting PAGE_OFFSET with asm instructions to use assemble-time arithmetic or the toreal() macro instead. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: Add of_find_property function	Dave C Boutcher	2006-01-13	1	-0/+3
\| \| \| \| \| \| \| \| \|	Add an of_find_property function that returns a struct property given a property name. Then change the get_property function to use that routine internally. Signed-off-by: Dave Boutcher <sleddog@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: Add/remove/update properties in firmware device tree	Dave C Boutcher	2006-01-13	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for updating and removing device tree properties. Since we hand out pointers to properties with gay abandon, we can't just free the property storage. Instead we move deleted, or the old copy of an updated property, to a "dead properties" list. Also note, its not feasable to kref device tree properties. we call get_property() all over the kernel in a wild variety of contexts. One consequence of this change is that we now take a read_lock(&devtree_lock) when doing get_property(). Signed-off-by: Dave Boutcher <sleddog@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	[PATCH] powerpc: Add some more pSeries hypervisor call constants	Dave C Boutcher	2006-01-13	1	-0/+5
\| \| \| \| \| \| \|	Adds a few more hypervisor call constants. Signed-off-by: Dave Boutcher <sleddog@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge	Linus Torvalds	2006-01-12	17	-155/+101
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Fix up delete/modify conflict of arch/ppc/kernel/process.c by hand (it's gone, gone, gone). Signed-off-by: Linus Torvalds <torvalds@osdl.org>
\| *	[PATCH] powerpc: small pci cleanups	Stephen Rothwell	2006-01-12	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pcibios_claim_one_bus is not needed on iSeries and phbs_remap_io can be mode static. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: clean up iommu.h a bit	Stephen Rothwell	2006-01-12	1	-19/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a function declared for CONFIG_PSERIES which no longer exists and the two function declarations for CONFIG_ISERIES have been moved into an include file in platforms/iseries since they are defined and used only there. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: iSeries fixes for build with no PCI	Stephen Rothwell	2006-01-12	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts part of "ppc64 iSeries: allow build with no PCI" (145d01e4287b8cbf50f87c3283e33bf5c84e8468) which affected generic code and applies a fix in the arch specific code. Commit "partly merge iseries do_IRQ" (5fee9b3b39eb55c7e3619a3b36ceeabffeb8f144) introduced iSeries_get_irq which was only available if CONFIG_PCI is set. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powercp: iSeries include file comment cleanups	Stephen Rothwell	2006-01-12	13	-16/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mainly just removing file names from the comments. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: eliminate bitfields from ItLpNaca	Stephen Rothwell	2006-01-12	1	-10/+11
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: remove bitfields from HvLpEvent	Stephen Rothwell	2006-01-12	1	-10/+31
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: remove bitfields from hv_call_event.h	Stephen Rothwell	2006-01-12	1	-98/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also does some comment cleanups and removal of unnecessary variables. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: Avoid potential FP corruption with preempt and UP	Paul Mackerras	2006-01-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Heikki Lindholm pointed out that there was a potential race with the lazy CPU state (FP, VR, EVR) stuff if preempt is enabled. The race is that in the process of restoring FP state on sigreturn, the task gets preempted by a user task that wants to use the FPU. It will take an FP unavailable exception, which will write the current FPU state to the thread_struct, overwriting the values which sigreturn has stored. Note that this can only happen on UP since we don't implement lazy CPU state on SMP. The fix is to flush the lazy CPU state before updating the thread_struct. To do this we re-use the flush_lazy_cpu_state() function from process.c. Signed-off-by: Paul Mackerras <paulus@samba.org>
* \|	[PATCH] death of get_thread_info/put_thread_info	Al Viro	2006-01-12	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	{get,put}_thread_info() were introduced in 2.5.4 and never had been called by anything in the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] scheduler cache-hot-autodetect	akpm@osdl.org	2006-01-12	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] sched: add cacheflush() asm	Ingo Molnar	2006-01-12	1	-0/+10
\|/ \| \| \| \| \| \| \| \| \|	Add per-arch sched_cacheflush() which is a write-back cacheflush used by the migration-cost calibration code at bootup time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge	Linus Torvalds	2006-01-11	6	-5/+92
\|\
\| *	powerpc/32: Fix compile error caused by pud_t/pgt_t confusion	Paul Mackerras	2006-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPC32 is still using asm-generic/4level-fixup.h, but asm-powerpc/page.h was defining pud_t and pgd_t. Depending on the order in which files got included, this could result in a compilation error. Tweak the ifdef so that page.h doesn't try to define pud_t on ppc32 (which uses 2-level page tables). Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc/64: per cpu data optimisations	Anton Blanchard	2006-01-11	2	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current ppc64 per cpu data implementation is quite slow. eg: lhz 11,18(13) /* smp_processor_id() / ld 9,.LC63-.LCTOC1(30) / per_cpu__variable_name / ld 8,.LC61-.LCTOC1(30) / __per_cpu_offset / sldi 11,11,3 / form index into __per_cpu_offset / mr 10,9 ldx 9,11,8 / __per_cpu_offset[smp_processor_id()] / ldx 0,10,9 / load per cpu data / 5 loads for something that is supposed to be fast, pretty awful. One reason for the large number of loads is that we have to synthesize 2 64bit constants (per_cpu__variable_name and __per_cpu_offset). By putting __per_cpu_offset into the paca we can avoid the 2 loads associated with it: ld 11,56(13) / paca->data_offset / ld 9,.LC59-.LCTOC1(30) / per_cpu__variable_name / ldx 0,9,11 / load per cpu data Longer term we can should be able to do even better than 3 loads. If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset was in a register we could cut it down to one load. A suggestion from Rusty is to use gcc's __thread extension here. In order to do this we would need to free up r13 (the __thread register and where the paca currently is). So far Ive had a few unsuccessful attempts at doing that :) The patch also allocates per cpu memory node local on NUMA machines. This patch from Rusty has been sitting in my queue _forever_ but stalled when I hit the compiler bug. Sorry about that. Finally I also only allocate per cpu data for possible cpus, which comes straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128) and 4 possible cpus we see some nice gains: total used free shared buffers cached Mem: 4012228 212860 3799368 0 0 162424 total used free shared buffers cached Mem: 4016200 212984 3803216 0 0 162424 A saving of 3.75MB. Quite nice for smaller machines. Note: we now have to be careful of per cpu users that touch data for !possible cpus. At this stage it might be worth making the NUMA and possible cpu optimisations generic, but per cpu init is done so early we have to be careful that all architectures have their possible map setup correctly. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: parallel port init fix	Michael Neuling	2006-01-11	1	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This stops parport from accessing nonexistent parallel ports. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: Make early debugging configurable via Kconfig	Michael Ellerman	2006-01-11	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds Kconfig entries to control the early debugging options, currently in setup_64.c. Doing this via Kconfig rather than #defines means you can have one source tree, which is buildable for multiple platforms - and you can enable the correct early debug option for each platform via .config. I made udbg_early_init() a static inline because otherwise GCC is to daft to optimise it away when debugging is off. Now that we have udbg_init_rtas() we can make call_rtas_display_status* static. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* \|	asm-powerpc: header included twice	Nicolas Kaiser	2006-01-11	1	-1/+0
\|/ \| \| \| \| \| \|	Header included twice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge	Linus Torvalds	2006-01-10	5	-10/+41
\|\
\| *	powerpc: Introduce a new config symbol to control 16550 early debug code	Paul Mackerras	2006-01-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous change by Kumar Gala in this area led to legacy_serial.c and udbg_16550.c being built as modules when CONFIG_SERIAL_8250=m. Fix this by introducing a new symbol, CONFIG_PPC_UDBG_16550, to control whether these files get built, and arrange for it to be selected for those platforms that need it. Signed-off-by: Paul Mackerras <paulus@samba.org>
\| *	[PATCH] powerpc: Save device BARs much earlier in the boot sequence	Linas Vepstas	2006-01-10	2	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	241-eeh-save-bars-earlier.patch Save the PCI device bars before any PCI probing is done. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 76c902b919098860f3d4e125f847abcc4cb1782a commit)
\| *	[PATCH] powerpc: Don't continue with PCI Error recovery if slot reset failed.	Linas Vepstas	2006-01-10	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)
\| *	[PATCH] powerpc: Remove duplicate code	Linas Vepstas	2006-01-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)
\| *	[PATCH] powerpc: Add "partitionable endpoint" support	Linas Vepstas	2006-01-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	26-eeh-partition-endpoint.patch New versions of firmware introduce a new method by which the "partitionable endpoint" (the point at which the pci bus is cut) should be located. This code adds the support for this (mandatory) new feature. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 9fcfb5d35b5294659f9299aa9cae6fd16325c07e commit)
\| *	[PATCH] powerpc: Split out PCI address cache to its own file	Linas Vepstas	2006-01-10	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	25-pci-address-cache.patch The core EEH file is rather large. This patch splits out a self-contained chunk of it into its own file. This is the chunk that performes the caching and lookup of pci devices based on the i/o addresses of thier resoures. This code is almos architecture-independent and could be used by any system that wanted to find a pci device based only on the i/o address used by the device. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from b0b291d59906d4a9a89ed9e34d9fd684c7188924 commit)
\| *	[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines	Linas Vepstas	2006-01-10	3	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)