op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: ppc: fix userspace mapping invalidation on context switch	Hollis Blanchard	2008-12-31	3	-22/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used to defer invalidating userspace TLB entries until jumping out of the kernel. This was causing MMU weirdness most easily triggered by using a pipe in the guest, e.g. "dmesg \| tail". I believe the problem was that after the guest kernel changed the PID (part of context switch), the old process's mappings were still present, and so copy_to_user() on the "return to new process" path ended up using stale mappings. Testing with large pages (64K) exposed the problem, probably because with 4K pages, pressure on the TLB faulted all process A's mappings out before the guest kernel could insert any for process B. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: use prefetchable mappings for guest memory	Hollis Blanchard	2008-12-31	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it can use large (256MB) mappings even if memory isn't a multiple of 256MB. To prevent the hardware prefetcher from loading from an invalid physical address through that mapping, it's marked Guarded. However, KVM must ensure that all guest mappings are backed by real physical RAM (since a deliberate access through a guarded mapping could still cause a machine check). Accordingly, we don't need to make our mappings guarded, so let's allow prefetching as the designers intended. Curiously this patch didn't affect performance at all on the quick test I tried, but it's clearly the right thing to do anyways and may improve other workloads. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: use MMUCR accessor to obtain TID	Hollis Blanchard	2008-12-31	1	-1/+1
\| \| \| \| \| \| \|	We have an accessor; might as well use it. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: fix Kconfig constraints	Hollis Blanchard	2008-12-31	1	-10/+8
\| \| \| \| \| \| \| \|	Make sure that CONFIG_KVM cannot be selected without processor support (currently, 440 is the only processor implementation available). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: improve trap emulation	Hollis Blanchard	2008-12-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	set ESR[PTR] when emulating a guest trap. This allows Linux guests to properly handle WARN_ON() (i.e. detect that it's a non-fatal trap). Also remove debugging printk in trap emulation. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: optimize irq delivery path	Hollis Blanchard	2008-12-31	4	-156/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In kvmppc_deliver_interrupt is just one case left in the switch and it is a rare one (less than 8%) when looking at the exit numbers. Therefore we can at least drop the switch/case and if an if. I inserted an unlikely too, but that's open for discussion. In kvmppc_can_deliver_interrupt all frequent cases are in the default case. I know compilers are smart but we can make it easier for them. By writing down all options and removing the default case combined with the fact that ithe values are constants 0..15 should allow the compiler to write an easy jump table. Modifying kvmppc_can_deliver_interrupt pointed me to the fact that gcc seems to be unable to reduce priority_exception[x] to a build time constant. Therefore I changed the usage of the translation arrays in the interrupt delivery path completely. It is now using priority without translation to irq on the full irq delivery path. To be able to do that ivpr regs are stored by their priority now. Additionally the decision made in kvmppc_can_deliver_interrupt is already sufficient to get the value of interrupt_msr_mask[x]. Therefore we can replace the 16x4byte array used here with a single 4byte variable (might still be one miss, but the chance to find this in cache should be better than the right entry of the whole array). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: optimize find first bit	Hollis Blanchard	2008-12-31	1	-1/+1
\| \| \| \| \| \| \| \|	Since we use a unsigned long here anyway we can use the optimized __ffs. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: optimize kvm stat handling	Hollis Blanchard	2008-12-31	1	-23/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we use an unnecessary if&switch to detect some cases. To be honest we don't need the ligh_exits counter anyway, because we can calculate it out of others. Sum_exits can also be calculated, so we can remove that too. MMIO, DCR and INTR can be counted on other places without these additional control structures (The INTR case was never hit anyway). The handling of BOOKE_INTERRUPT_EXTERNAL/BOOKE_INTERRUPT_DECREMENTER is similar, but we can avoid the additional if when copying 3 lines of code. I thought about a goto there to prevent duplicate lines, but rewriting three lines should be better style than a goto cross switch/case statements (its also not enough code to justify a new inline function). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: fix set regs to take care of msr change	Hollis Blanchard	2008-12-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When changing some msr bits e.g. problem state we need to take special care of that. We call the function in our mtmsr emulation (not needed for wrtee[i]), but we don't call kvmppc_set_msr if we change msr via set_regs ioctl. It's a corner case we never hit so far, but I assume it should be kvmppc_set_msr in our arch set regs function (I found it because it is also a corner case when using pv support which would miss the update otherwise). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: adjust vcpu types to support 64-bit cores	Hollis Blanchard	2008-12-31	4	-33/+33
\| \| \| \| \| \| \| \|	However, some of these fields could be split into separate per-core structures in the future. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor	Hollis Blanchard	2008-12-31	9	-63/+154
\| \| \| \| \| \| \| \| \| \| \|	This patch doesn't yet move all 44x-specific data into the new structure, but is the first step down that path. In the future we may also want to create a struct kvm_vcpu_booke. Based on patch from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: Move the last bits of 44x code out of booke.c	Hollis Blanchard	2008-12-31	3	-44/+58
\| \| \| \| \| \| \|	Needed to port to other Book E processors. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: refactor instruction emulation into generic and core-specific pieces	Hollis Blanchard	2008-12-31	8	-276/+415
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cores provide 3 emulation hooks, implemented for example in the new 4xx_emulate.c: kvmppc_core_emulate_op kvmppc_core_emulate_mtspr kvmppc_core_emulate_mfspr Strictly speaking the last two aren't necessary, but provide for more informative error reporting ("unknown SPR"). Long term I'd like to have instruction decoding autogenerated from tables of opcodes, and that way we could aggregate universal, Book E, and core-specific instructions more easily and without redundant switch statements. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	ppc: Create disassemble.h to extract instruction fields	Hollis Blanchard	2008-12-31	2	-56/+81
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is used in a couple places in KVM, but isn't KVM-specific. However, this patch doesn't modify other in-kernel emulation code: - xmon uses a direct copy of ppc_opc.c from binutils - emulate_instruction() doesn't need it because it can use a series of mask tests. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: Refactor powerpc.c to relocate 440-specific code	Hollis Blanchard	2008-12-31	9	-130/+207
\| \| \| \| \| \| \| \| \| \| \| \|	This introduces a set of core-provided hooks. For 440, some of these are implemented by booke.c, with the rest in (the new) 44x.c. Note that these hooks are link-time, not run-time. Since it is not possible to build a single kernel for both e500 and 440 (for example), using function pointers would only add overhead. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: combine booke_guest.c and booke_host.c	Hollis Blanchard	2008-12-31	4	-89/+66
\| \| \| \| \| \| \| \|	The division was somewhat artificial and cumbersome, and had no functional benefit anyways: we can only guests built for the real host processor. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe"	Hollis Blanchard	2008-12-31	6	-34/+33
\| \| \| \| \| \| \| \| \|	This will ease ports to other cores. Also remove unused "struct kvm_tlb" while we're at it. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: ppc: Move 440-specific TLB code into 44x_tlb.c	Hollis Blanchard	2008-12-31	4	-143/+145
\| \| \| \| \| \| \|	This will make it easier to provide implementations for other cores. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	Merge branch 'core-for-linus' of ↵	Linus Torvalds	2008-12-30	1	-0/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits) stacktrace: provide save_stack_trace_tsk() weak alias rcu: provide RCU options on non-preempt architectures too printk: fix discarding message when recursion_bug futex: clean up futex_(un)lock_pi fault handling "Tree RCU": scalable classic RCU implementation futex: rename field in futex_q to clarify single waiter semantics x86/swiotlb: add default swiotlb_arch_range_needs_mapping x86/swiotlb: add default phys<->bus conversion x86: unify pci iommu setup and allow swiotlb to compile for 32 bit x86: add swiotlb allocation functions swiotlb: consolidate swiotlb info message printing swiotlb: support bouncing of HighMem pages swiotlb: factor out copy to/from device swiotlb: add arch hook to force mapping swiotlb: allow architectures to override phys<->bus<->phys conversions swiotlb: add comment where we handle the overflow of a dma mask on 32 bit rcu: fix rcutorture behavior during reboot resources: skip sanity check of busy resources swiotlb: move some definitions to header swiotlb: allow architectures to override swiotlb pool allocation ... Fix up trivial conflicts in arch/x86/kernel/Makefile arch/x86/mm/init_32.c include/linux/hardirq.h as per Ingo's suggestions.
\| *	"Tree RCU": scalable classic RCU implementation	Paul E. McKenney	2008-12-18	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	Merge branch 'next' of ↵	Linus Torvalds	2008-12-28	222	-2489/+6151
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions
\| * \|	powerpc/44x: Support 16K/64K base page sizes on 44x	Ilya Yanok	2008-12-29	10	-48/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Force memory size to be a multiple of PAGE_SIZE	Hollis Blanchard	2008-12-29	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that total memory size is page-aligned, because otherwise mark_bootmem() gets upset. This error case was triggered by using 64 KiB pages in the kernel while arch/powerpc/boot/4xx.c arbitrarily reduced the amount of memory by 4096 (to work around a chip bug that affects the last 256 bytes of physical memory). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/32: Wire up the trampoline code for kdump	Dale Farnsworth	2008-12-23	3	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wire up the trampoline code for ppc32 to relay exceptions from the vectors at address 0 to vectors at address 32MB, and modify Kconfig to enable Kdump support for all classic powerpcs. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M	Dale Farnsworth	2008-12-23	5	-14/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/32: Allow __ioremap on RAM addresses for kdump kernel	Anton Vorontsov	2008-12-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While for debugging it is good to catch bogus users of ioremap, though for kdump support it is more convenient to use __ioremap for copy_oldmem_page() (exactly as we do for PPC64 currently). Note that copy_oldmem_page() calls __ioremap with flags set to '0', so it should be safe with the regard to the caches. The other option is to use kmap_atomic_pfn()[1], but it will not work for kernels compiled without HIGHMEM. That is, on a board with 256MB RAM and crashkernel=64M@32M case, the !HIGHMEM capturing kernel maps 0-96M range, which does not include all the memory needed to capture the dump. And, obviously, accessing anything upper than 96M will cause faults. [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/32: Setup OF properties for kdump	Dale Farnsworth	2008-12-23	2	-52/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor the setting of kdump OF properties, moving the common code from machine_kexec_64.c to machine_kexec.c where it can be used on both ppc64 and ppc32. This will be needed for kdump to work on ppc32 platforms. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()	Anton Vorontsov	2008-12-23	2	-10/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the dummy crash_setup_regs function with full-fledged crash_setup_regs implementation. On PPC32 we simply use the new ppc_save_regs function to dump the registers. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Prepare xmon_save_regs for use with kdump	Anton Vorontsov	2008-12-23	5	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Today the arch/powerpc/xmon/setjmp.S file contains only the xmon_save_regs function. We want to use it for kdump purposes, so let's move the file into arch/powerpc/kernel/ and give the function a more generic name (ppc_save_regs). Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Remove default kexec/crash_kernel ops assignments	Anton Vorontsov	2008-12-23	7	-43/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Default ops are implicit now. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Make default kexec/crash_kernel ops implicit	Anton Vorontsov	2008-12-23	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the need for each platform to specify default kexec and crash kernel ops, thus effectively adds a working kexec support for most 6xx/7xx/7xxx-based boards. Platforms that can't cope with default ops will explode in some weird way (a hang or reboot is most likely), which means that the board's kexec support should be fixed or blacklisted via dummy _prepare callback returning -ENOSYS. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Setup OF properties for ppc32 kexec	Dale Farnsworth	2008-12-23	2	-19/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor the setting of kexec OF properties, moving the common code from machine_kexec_64.c to machine_kexec.c where it can be used on both ppc64 and ppc32. This is needed for kexec to work on ppc32 platforms. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/pseries: Fix cpu hotplug	Sebastien Dugue	2008-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, pseries_cpu_die() calls msleep() while polling RTAS for the status of the dying cpu. However, if the cpu that is going down also happens to be the one doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification. Therefore jiffies won't be updated anymore. This replaces that msleep() with a cpu_relax() to make sure we're not going to schedule at that point. With this patch my test box survives a 100k iterations hotplug stress test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Fix KVM build on ppc440	Paul Mackerras	2008-12-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2a4aca1144394653269720ffbb5a325a77abd5fa ("powerpc/mm: Split low level tlb invalidate for nohash processors") changed a call to _tlbia to _tlbil_all but didn't include the header that defines _tlbil_all, leading to a build failure on 440 if KVM is enabled. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/cell: add QPACE as a separate Cell platform	Benjamin Krill	2008-12-22	4	-15/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the QPACE (Chromodynamics Parallel Computing on the Cell Broadband Engine) platform doesn't use a iommu, doesn't have PCI devices and a MPIC much lesser setup and configurations are needed. So far all devices are detected as OF device. A notifier function is used to set the dma_ops for the of_platform bus. Further this patch splits the PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are shared with the QPACE platform and the rest. Signed-off-by: Benjamin Krill <ben@codiert.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
\| * \|	powerpc/cell: fix build breakage with CONFIG_SPUFS disabled	Arnd Bergmann	2008-12-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CBE_THERM and OPROFILE_CELL both cannot be built without SPU_FS disabled, so make the dependency explicit. Reported-by: Milton Miller <miltonm@bga.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
\| * \|	powerpc/mpc5200: add rts/cts handling in PSC UART driver	Wolfram Sang	2008-12-21	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add RTS/CTS-support for the PSC of the MPC5200B. Tested with a Phytec MPC5200B-IO. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver	Tim Yamin	2008-12-21	1	-12/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds MDMA/UDMA support using BestComm for DMA on the MPC5200 platform. Based heavily on previous work by Freescale (Bernard Kuhn, John Rigby) and Domen Puncer. With this patch, a SanDisk Extreme IV CF card gets read speeds of approximately 26.70 MB/sec. Signed-off-by: Tim Yamin <plasm@roo.me.uk> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Disable bestcomm prefetching when ATA DMA enabled	Grant Likely	2008-12-21	3	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ATA DMA is enabled, bestcomm prefetching does not work. This patch adds a function to disable bestcomm prefetch when the ATA Bestcomm task is initialized. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Bestcomm fixes to ATA support	Tim Yamin	2008-12-21	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) ata.h has dst_pa in the wrong place (needs to match what the BestComm task microcode in bcom_ata_task.c expects); fix it. 2) The BestComm ATA task priority was changed to maximum in bestcomm_priv.h; this fixes a deadlock issue experienced with heavy DMA occurring on both the ATA and Ethernet BestComm tasks, e.g. when downloading a large file over a LAN to disk. Signed-off-by: Tim Yamin <plasm@roo.me.uk> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Bugfix on handling variable sized buffer descriptors	Grant Likely	2008-12-21	1	-19/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The buffer descriptors for the ATA BestComm task are larger than the current definition for bcom_bd. This causes problems because the various bcom_... functions dereference the buffer descriptor pointer by using the array operator which doesn't work when the buffer descriptors are a different size. This patch adds the bcom_get_bd() function which uses the value in bcom_task.bd_size to calculate the offset into the BD table. This patch also changes the definition of bcom_bd to specify a data size of 0 instead of 1 so that it will never work if anyone attempts to dereference the bd list as an array (as opposed to something that might work even though it is wrong). Finally, this patch moves the definition of bcom_bd up in the file to eliminate a forward declaration. Based on patch originally written by Tim Yamin. Signed-off-by: Tim Yamin <plasm@roo.me.uk> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Make internal 5200 PIC the default interrupt controller	Grant Likely	2008-12-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MPC5200 internal interrupt controller setup function needs to set the default interrupt controller when it is called. Without this irq_create_of_mapping() cannot be called without first determining the pointer to the irq controller (ie. call with controller = NULL). Reported-by: Steven Cavanagh <scavanagh@secretlab.ca> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/mpc5200: Document and tidy irq driver	Grant Likely	2008-12-21	5	-122/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds documentation to the mpc5200 interrupt controller driver and cleans up some minor coding conventions. It also moves the contents of mpc52xx_pic.h into the driver proper (except for a small common bit that is moved to the common mpc52xx.h) because the information encoded there is not required by any other part of kernel code. Finally for code readability sake, the L2_OFFSET shift value is removed because the code using it resolves to a noop. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc: Fix missing 'blr' in _tlbia()	Benjamin Herrenschmidt	2008-12-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rework to MMU code dropped a much missed 'blr' instruction. Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
\| * \|	powerpc/bootwrapper: Use the child-bus #address-cells to decide which range ↵	Scott Wood	2008-12-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	entry to use The correct #address-cells was still used for the actual translation, so the impact is only a possibility of choosing the wrong range entry or failing to find any match. Most common cases were not affected. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc: Const-qualify Device Node Argument to DCR Resource Extent API	Grant Erickson	2008-12-21	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add const qualifier to device_node argument for dcr_resource_{start,len} as of_get_property also const-qualifies this argument. Signed-off-by: Grant Erickson <gerickson@nuovations.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages	Benjamin Herrenschmidt	2008-12-21	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After discussing with chip designers, it appears that it's not necessary to set G everywhere on 440 cores. The various core errata related to prefetch should be sorted out by firmware by disabling icache prefetching in CCR0. We add the workaround to the kernel however just in case oooold firmwares don't do it. This is valid for -all- 4xx core variants. Later ones hard wire the absence of prefetch but it doesn't harm to clear the bits in CCR0 (they should already be cleared anyway). We still leave G=1 on the linear mapping for now, we need to stop over-mapping RAM to be able to remove it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED	Benjamin Herrenschmidt	2008-12-21	8	-75/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs	Benjamin Herrenschmidt	2008-12-21	3	-54/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
\| * \|	powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440	Benjamin Herrenschmidt	2008-12-21	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The handlers for Critical, Machine Check or Debug interrupts will save and restore MMUCR nowadays, thus we only need to disable normal interrupts when invalidating TLB entries. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>