op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[PATCH] x86: cpu_khz type fix	Andrew Morton	2005-06-23	7	-9/+11
\| \| \| \| \| \| \| \| \| \|	x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use unsigned long. So make cpu_khz unsigned int on x86 as well. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Remove i386_ksyms.c, almost.	Alexey Dobriyan	2005-06-23	20	-162/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* EXPORT_SYMBOL's moved to other files * #include <linux/config.h>, <linux/module.h> where needed * #include's in i386_ksyms.c cleaned up * After copy-paste, redundant due to Makefiles rules preprocessor directives removed: #ifdef CONFIG_FOO EXPORT_SYMBOL(foo); #endif obj-$(CONFIG_FOO) += foo.o * Tiny reformat to fit in 80 columns Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] x86: #include asm/uaccess.h in asm/checksum.h	Alexey Dobriyan	2005-06-23	1	-0/+2
\| \| \| \| \| \| \| \| \|	csum_and_copy_to_user is static inline and uses VERIFY_WRITE. Patch allows to remove asm/uaccess.h from i386_ksyms.c without dependency surprises. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] VIA 82C586B IRQ routing fix	Aleksey Gorelov	2005-06-23	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	According to the VIA 82C586B datasheet (still available from http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a special PIRQ mapping. Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Aleksey Gorelov <aleksey_gorelov@phoenix.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] x86: avoid wasting IRQs for PCI devices	Natalie Protasevich	2005-06-23	1	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have submitted the patch for x86_64, this is submission for i386. The patch changes the way IRQs are handed out to PCI devices. Currently, each I/O APIC pin gets associated with an IRQ, no matter if the pin is used or not. This imposes severe limitation on systems that have designs that employ many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset. It is used in ES7000, and currently, there is no way to boot the system with more that 9 I/O APICs. The simple change below allows to boot a system with say 64 (or more) I/O APICs, each providing 1 slot, which otherwise impossible because of the IRQ gaps created for unused lines on each I/O APIC. It does not resolve the problem with number of devices that exceeds number of possible IRQs, but eases up a tension for IRQs on any large system with potentually large number of devices. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ia64: Selectable Timer Interrupt Frequency	Christoph Lameter	2005-06-23	2	-1/+3
\| \| \| \| \| \| \| \| \| \|	It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ. Reducing the timer frequency may have important performance benefits on large systems. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] i386: Selectable Frequency of the Timer Interrupt	Christoph Lameter	2005-06-23	5	-3/+57
\| \| \| \| \| \| \| \| \| \| \|	Make the timer frequency selectable. The timer interrupt may cause bus and memory contention in large NUMA systems since the interrupt occurs on each processor HZ times per second. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] allow early printk to use more than 25 lines	Jan Beulich	2005-06-23	1	-3/+10
\| \| \| \| \| \| \| \| \| \|	Allow early printk code to take advantage of the full size of the screen, not just the first 25 lines. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] adjust i386 watchdog tick calculation	Jan Beulich	2005-06-23	1	-9/+15
\| \| \| \| \| \| \| \| \| \|	Get the i386 watchdog tick calculation into a state where it can also be used on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into a single place (for potential furture adjustments). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386)	Natalie Protasevich	2005-06-23	10	-19/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and use heuristics to prevent unique I/O APIC ID check for systems that don't need it. The patch disables unique I/O APIC ID check for Xeon-based and other platforms that don't use serial APIC bus for interrupt delivery. Andi stated that AMD systems don't need unique IO_APIC_IDs either. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] i386: never block forced SIGSEGV	Roland McGrath	2005-06-23	1	-11/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This problem was first noticed on PPC and has already been fixed there. But the exact same issue applies to other platforms in the same way. The signal blocking for sa_mask and the handled signal takes place after the handler setup. When the stack is bogus, the handler setup forces a SIGSEGV. But then this will be blocked, and returning to user mode will fault again and iterate. This patch fixes the problem by checking whether signal handler setup failed, and not doing the signal-blocking if so. This copies what was done in the ppc code. I think all architectures' signal handler setup code follows this pattern and needs the change. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] NUMA aware block device control structure allocation	Christoph Lameter	2005-06-23	11	-30/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch to allocate the control structures for for ide devices on the node of the device itself (for NUMA systems). The patch depends on the Slab API change patch by Manfred and me (in mm) and the pcidev_to_node patch that I posted today. Does some realignment too. Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org> Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Pravin Shelar <pravin@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] x86/x86_64: pcibus_to_node	Christoph Lameter	2005-06-23	5	-34/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define pcibus_to_node to be able to figure out which NUMA node contains a given PCI device. This defines pcibus_to_node(bus) in include/linux/topology.h and adjusts the macros for i386 and x86_64 that already provided a way to determine the cpumask of a pci device. x86_64 was changed to not build an array of cpumasks anymore. Instead an array of nodes is build which can be used to generate the cpumask via node_to_cpumask. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ppc64: pcibus_to_node fix	Christoph Lameter	2005-06-23	1	-3/+1
\| \| \| \| \| \| \| \|	asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to provide the fall back pcibus_to_node function. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] m32r: build fix for asm-m32r/topology.h	Hirokazu Takata	2005-06-23	1	-43/+1
\| \| \| \| \| \| \| \|	Use asm-generic/topology.h to fix yet another pcibus_to_node() build error. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Platform SMIs and their interferance with tsc based delay calibration	Venkatesh Pallipadi	2005-06-23	10	-0/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue: Current tsc based delay_calibration can result in significant errors in loops_per_jiffy count when the platform events like SMIs (System Management Interrupts that are non-maskable) are present. This could lead to potential kernel panic(). This issue is becoming more visible with 2.6 kernel (as default HZ is 1000) and on platforms with higher SMI handling latencies. During the boot time, SMIs are mostly used by BIOS (for things like legacy keyboard emulation). Description: The psuedocode for current delay calibration with tsc based delay looks like (0) Estimate a value for loops_per_jiffy (1) While (loops_per_jiffy estimate is accurate enough) (2) wait for jiffy transition (jiffy1) (3) Note down current tsc (tsc1) (4) loop until tsc becomes tsc1 + loops_per_jiffy (5) check whether jiffy changed since jiffy1 or not and refine loops_per_jiffy estimate Consider the following cases Case 1: If SMIs happen between (2) and (3) above, we can end up with a loops_per_jiffy value that is too low. This results in shorted delays and kernel can panic () during boot (Mostly at IOAPIC timer initialization timer_irq_works() as we don't have enough timer interrupts in a specified interval). Case 2: If SMIs happen between (3) and (4) above, then we can end up with a loops_per_jiffy value that is too high. And with current i386 code, too high lpj value (greater than 17M) can result in a overflow in delay.c:__const_udelay() again resulting in shorter delay and panic(). Solution: The patch below makes the calibration routine aware of asynchronous events like SMIs. We increase the delay calibration time and also identify any significant errors (greater than 12.5%) in the calibration and notify it to user. Patch below changes both i386 and x86-64 architectures to use this new and improved calibrate_delay_direct() routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh	Ian Campbell	2005-06-23	6	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The attached patch causes the various arch specific install.sh scripts to look for ${CROSS_COMPILE}installkernel rather than just installkernel (in both /sbin/ and ~/bin/ where the script already did this). This allows you to have e.g. arm-linux-installkernel as a handy way to install on your cross target. It also prevents the script picking up on the host /sbin/installkernel which causes the script to fall through and do the install itself (which is what I actually use myself, with $INSTALL_PATH set). I don't believe it causes back-compatibility problems since calling the host installkernel was never likely to work or be what you wanted when cross compiling anyway. If $CROSS_COMPILE isn't set then nothing changes. I only use ARM and i386 myself but I figured it couldn't hurt to do the whole lot. I've cc'd those who I hope are the arch maintainers for files that I've touched. Signed-off-by: Ian Campbell <icampbell@arcom.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] biarch compiler support for i386	H. Peter Anvin	2005-06-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	This allows the i386 architecture to be built on a system with a biarch compiler that defaults to x86-64, merely by specifying ARCH=i386. As previously discussed, this uses the equivalent logic to the ppc port. Signed-Off-By: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] add page_state info to show_mem	Martin J. Bligh	2005-06-23	1	-0/+8
\| \| \| \| \| \| \| \|	This helps a lot when debugging out of memory stuff - useful especially to see if all the memory is sucked into slab, etc. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] add x86-64 specific support for sparsemem	Matt Tolentino	2005-06-23	4	-12/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds in the necessary support for sparsemem such that x86-64 kernels may use sparsemem as an alternative to discontigmem for NUMA kernels. Note that this does no preclude one from continuing to build NUMA kernels using discontigmem, but merely allows the option to build NUMA kernels with sparsemem. Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels results in reduced text size for otherwise equivalent kernels as shown in the example builds below: text data bss dec hex filename 2371036 765884 1237108 4374028 42be0c vmlinux.discontig 2366549 776484 1302772 4445805 43d66d vmlinux.sparse Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options	Matt Tolentino	2005-06-23	9	-24/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to use the alternative sparsemem implmentation for NUMA kernels, we need to reorganize the config options. This patch effectively abstracts out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus, the discontigmem implementation may be employed as always, but the sparsemem implementation may be used alternatively. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] add x86-64 Kconfig options for sparsemem	Matt Tolentino	2005-06-23	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	Add the requisite arch specific Kconfig options to enable the use of the sparsemem implementation for NUMA kernels on x86-64. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] remove direct ref to contig_page_data for x86-64	Matt Tolentino	2005-06-23	2	-5/+1
\| \| \| \| \| \| \| \| \| \|	This patch pulls out all remaining direct references to contig_page_data from arch/x86-64, thus saving an ifdef in one case. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ppc64: sparsemem memory model	Andy Whitcroft	2005-06-23	7	-21/+68
\| \| \| \| \| \| \| \| \| \| \| \|	Provide the architecture specific implementation for SPARSEMEM for PPC64 systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part) Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ppc64: add memory present	Andy Whitcroft	2005-06-23	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Provide hooks for PPC64 to allow memory models to be informed of installed memory areas. This allows SPARSEMEM to instantiate mem_map for the populated areas. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ppc64: add early_pfn_to_nid	Andy Whitcroft	2005-06-23	2	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Provide an implementation of early_pfn_to_nid for PPC64. This is used by memory models to determine the node from which to take allocations before the memory allocators are fully initialised. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem hotplug base	Andy Whitcroft	2005-06-23	3	-27/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sparse's initalization be accessible at runtime. This allows sparse mappings to be created after boot in a hotplug situation. This patch is separated from the previous one just to give an indication how much of the sparse infrastructure is just for hotplug memory. The section_mem_map doesn't really store a pointer. It stores something that is convenient to do some math against to get a pointer. It isn't valid to just do section_mem_map, so I don't think it should be stored as a pointer. There are a couple of things I'd like to store about a section. First of all, the fact that it is !NULL does not mean that it is present. There could be such a combination where section_mem_map is* NULL, but the math gets you properly to a real mem_map. So, I don't think that check is safe. Since we're storing 32-bit-aligned structures, we have a few bits in the bottom of the pointer to play with. Use one bit to encode whether there's really a mem_map there, and the other one to tell whether there's a valid section there. We need to distinguish between the two because sometimes there's a gap between when a section is discovered to be present and when we can get the mem_map for it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem swiss cheese numa layouts	Andy Whitcroft	2005-06-23	3	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The part of the sparsemem patch which modifies memmap_init_zone() has recently become a problem. It changes behavior so that there is a call to pfn_to_page() for each individual page inside of a node's range: node_start_pfn through node_end_pfn. It used to simply do this once, at the beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside of a node made it necessary to change. Mike Kravetz recently wrote a patch which made the NUMA code accept some new kinds of layouts. The system's memory was laid out like this, with node 0's memory in two pieces: one before and one after node 1's memory: Node 0: +++++ +++++ Node 1: +++++ Previous behavior before Mike's patch was to assign nodes like this: Node 0: 00000 XXXXX Node 1: 11111 Where the 'X' areas were simply thrown away. The new behavior was to make the pg_data_t span node 0 across all of its areas, including areas that are really node 1's: Node 0: 000000000000000 Node 1: 11111 This wastes a little bit of mem_map space, but ends up being OK, and more fully utilizes the system's memory. memmap_init_zone() initializes all of the "struct page"s for node 0, even for the "hole", but those never get used, because there is no pfn_to_page() that resolves to those pages. However, only calling pfn_to_page() once, memmap_init_zone() always uses the pages that were allocated for node0->node_mem_map because: struct page *start = pfn_to_page(start_pfn); // effectively start = &node->node_mem_map[0] for (page = start; page < (start + size); page++) { init_page_here();... page++; } Slow, and wasteful, but generally harmless. But, modify that to call pfn_to_page() for each loop iteration (like sparsemem does): for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) { page = pfn_to_page(pfn); } And you end up trying to initialize node 1's pages too early, along with bogus data from node 0. This patch checks for those weird layouts and declines to touch the pages, making the more frequent pfn_to_page() calls OK to do. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem memory model for i386	Andy Whitcroft	2005-06-23	9	-81/+135
\| \| \| \| \| \| \| \| \| \| \| \|	Provide the architecture specific implementation for SPARSEMEM for i386 SMP and NUMA systems. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem memory model	Andy Whitcroft	2005-06-23	10	-33/+332
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of mem_map[] is needed by discontiguous memory machines (like in the old CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually become a complete replacement. A significant advantage over DISCONTIGMEM is that it's completely separated from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA and DISCONTIG are often confused. Another advantage is that sparse doesn't require each NUMA node's ranges to be contiguous. It can handle overlapping ranges between nodes with no problems, where DISCONTIGMEM currently throws away that memory. Sparsemem uses an array to provide different pfn_to_page() translations for each SECTION_SIZE area of physical memory. This is what allows the mem_map[] to be chopped up. In order to do quick pfn_to_page() operations, the section number of the page is encoded in page->flags. Part of the sparsemem infrastructure enables sharing of these bits more dynamically (at compile-time) between the page_zone() and sparsemem operations. However, on 32-bit architectures, the number of bits is quite limited, and may require growing the size of the page->flags type in certain conditions. Several things might force this to occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of memory), an increase in the physical address space, or an increase in the number of used page->flags. One thing to note is that, once sparsemem is present, the NUMA node information no longer needs to be stored in the page->flags. It might provide speed increases on certain platforms and will be stored there if there is room. But, if out of room, an alternate (theoretically slower) mechanism is used. This patch introduces CONFIG_FLATMEM. It is used in almost all cases where there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM often have to compile out the same areas of code. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] generify memory present	Andy Whitcroft	2005-06-23	2	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Allow architectures to indicate that they will be providing hooks to indice installed memory areas, memory_present(). Provide prototypes for the i386 implementation. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] generify early_pfn_to_nid	Andy Whitcroft	2005-06-23	3	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Provide a default implementation for early_pfn_to_nid returning node 0. Allow architectures to override this with their own implementation out of asm/mmzone.h. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ppc64: Kconfig memory models	Mike Kravetz	2005-06-23	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This patch changes some of the default behavior in the ppc64 Kconfig file that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in preparation for SPARSEMEM. Patch allows the display of both FLAT and DISCONTIG models on pseries. As before, default is DISCONTIG for SMP and PSERIES and FLAT for others. Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm/Kconfig: give DISCONTIG more help text	Dave Hansen	2005-06-23	1	-0/+10
\| \| \| \| \| \| \| \| \|	This gives DISCONTIGMEM a bit more help text to explain what it does, not just when to choose it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm/Kconfig: hide "Memory Model" selection menu	Dave Hansen	2005-06-23	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	I got some feedback from users who think that the new "Memory Model" menu is a little invasive. This patch will hide that menu, except when CONFIG_EXPERIMENTAL is enabled or when an individual architecture wants it. An individual arch may want to enable it because they've removed their arch-specific DISCONTIG prompt in favor of the mm/Kconfig one. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE	Dave Hansen	2005-06-23	3	-12/+0
\| \| \| \| \| \| \| \| \| \|	This used to be used to disable FLATMEM selection, but I decided to change it to be done generically when DISCONTIG is enabled. The option is unused, so this kills it. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem: fix minor "defaults" issue in mm/Kconfig	Dave Hansen	2005-06-23	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The following patch applies on top of 2.6.12-rc2-mm1. It fixes a minor user interaction issue, and an early reference to SPARSEMEM. This "choice" menu would always default to FLATMEM, as it was listed first. Move it to the end so that the other defaults have a chance first. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG	Dave Hansen	2005-06-23	3	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is some confusion that arose when working on SPARSEMEM patch between what is needed for DISCONTIG vs. NUMA. Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently. All of the current NUMA implementations require an implementation of DISCONTIG. Because of this, quite a lot of code which is really needed for NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n case. Introducing this new NEED_MULTIPLE_NODES config option allows code that is needed for both NUMA or DISCONTIG to be separated out from code that is specific to DISCONTIG. One great advantage of this approach is that it doesn't require every architecture to be converted over. All of the current implementations should "just work", only the ones implementing SPARSEMEM will have to be fixed up. The change to free_area_init() makes it work inside, or out of the new config option. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE	Dave Hansen	2005-06-23	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	This will at least suppress one prompt that users would have received the first time they compile with the new DISCONTIG arch option. They'll still get the "Memory Model" prompt, but 99% of them will have the default work there. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] make each arch use mm/Kconfig	Dave Hansen	2005-06-23	23	-10/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For all architectures, this just means that you'll see a "Memory Model" choice in your architecture menu. For those that implement DISCONTIGMEM, you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool y" and make your users select DISCONTIGMEM right out of the new choice menu. The only disadvantage might be if you have some specific things that you need in your help option to explain something about DISCONTIGMEM. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] create mm/Kconfig for arch-independent memory options	Dave Hansen	2005-06-23	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With sparsemem being introduced, we need a central place for new memory-related .config options: mm/Kconfig. This allows us to remove many of the duplicated arch-specific options. The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and DISCONTIGMEM. This is a requirement for sparsemem because sparsemem uses the NUMA code without the presence of DISCONTIGMEM. The sparsemem patches use CONFIG_FLATMEM in generic code, so this patch is a requirement before applying them. Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use '#ifdef CONFIG_FLATMEM' instead. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem base: teach discontig about sparse ranges	Dave Hansen	2005-06-23	3	-1/+17
\| \| \| \| \| \| \| \| \| \| \|	discontig.c has some assumptions that mem_map[]s inside of a node are contiguous. Teach it to make sure that each region that it's bringing online is actually made up of valid ranges of ram. Written-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem base: reorganize page->flags bit operations	Dave Hansen	2005-06-23	3	-22/+52
\| \| \| \| \| \| \| \| \| \| \| \|	Generify the value fields in the page_flags. The aim is to allow the location and size of these fields to be varied. Additionally we want to move away from fixed allocations per field whilst still enforcing the overall bit utilisation limits. We rely on the compiler to spot and optimise the accessor functions. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem base: simple NUMA remap space allocator	Dave Hansen	2005-06-23	4	-29/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a simple allocator for the NUMA remap space. This space is very scarce, used for structures which are best allocated node local. This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep the pgdat->node_mem_map initialized in a consistent place for all architectures. Issues: o alloc_remap takes a node_id where we might expect a pgdat which was intended to allow us to allocate the pgdat's using this mechanism; which we do not yet do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)	Dave Hansen	2005-06-23	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following four patches provide the last needed changes before the introduction of sparsemem. For a more complete description of what this will do, please see this patch: http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch or previous posts on the subject: http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2 http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2 Three of these are i386-only, but one of them reorganizes the macros used to manage the space in page->flags, and will affect all platforms. There are analogous patches to the i386 ones for ppc64, ia64, and x86_64, but those will be submitted by the normal arch maintainers. The combination of the four patches has been test-booted on a variety of i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17 different .configs. It's also been runtime-tested on ia64 configs (with more patches on top). This patch: We _know_ which node pages in general belong to, at least at a very gross level in node_{start,end}_pfn[]. Use those to target the allocations of pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map	Dave Hansen	2005-06-23	14	-36/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch effectively eliminates direct use of pgdat->node_mem_map outside of the DISCONTIG code. On a flat memory system, these fields aren't currently used, neither are they on a sparsemem system. There was also a node_mem_map(nid) macro on many architectures. Its use along with the use of ->node_mem_map itself was not consistent. It has been removed in favor of two new, more explicit, arch-independent macros: pgdat_page_nr(pgdat, pagenr) nid_page_nr(nid, pagenr) I called them "pgdat" and "nid" because we overload the term "node" to mean "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I believe the newer names are much clearer. These macros can be overridden in the sparsemem case with a theoretically slower operation using node_start_pfn and pfn_to_page(), instead. We could make this the only behavior if people want, but I don't want to change too much at once. One thing at a time. This patch removes more code than it adds. Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386 generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/ Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Martin J. Bligh <mbligh@aracnet.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Merge 'misc-fixes' branch of ↵	Linus Torvalds	2005-06-23	1	-0/+1
\|\ \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
\| *	e1000: fix spinlock bug	Mitch Williams	2005-06-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an obvious and nasty bug where we could exit the transmit routine while holding tx_lock. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
* \|	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6	Linus Torvalds	2005-06-22	2	-5/+4
\|\ \
\| * \|	[PATCH] driver core: Fix up the device_attach() error handling in ↵	Greg Kroah-Hartman	2005-06-22	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bus_add_device() Don't error out if something "bad" happens when trying to bind a driver to a device. We want the sysfs attributes to be present for later when we try to tear down the device. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>