op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[IA64] fix show_mem for VIRTUAL_MEM_MAP+FLATMEM	Bob Picco	2006-08-03	3	-67/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	contig.c (FLATMEM) requires the same optimization as in discontig.c for show_mem when VIRTUAL_MEM_MAP is in use. Otherwise FLATMEM has softlockup timeouts. This was boot tested for memory configuration: SPARSEMEM, DISCONTIG+VIRTUAL_MEM_MAP, FLATMEM, FLATMEM+VIRTUAL_MEM_MAP and FLATMEM+VIRTUAL_MEM_MAP with largest memory gap less than LARGE_GAP by using boot parameter "mem=". This was boot tested and "echo m >/proc/sysrq-trigger" output evaluated for : FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and SPARSEMEM. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] align high endpoint of VIRTUAL_MEM_MAP	Bob Picco	2006-08-03	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assure that vmem_map's high endpoint is MAX_ORDER aligned. Not doing so violates the buddy allocator algorithm. Also anyone using mem=XXX on boot line and not aligned to MAX_ORDER requires this patch in order to satisfy buddy allocator. vmem_map always starts at pfn 0. The potentially large MAX_ORDER on ia64 (due to hugetlbfs) requires that the end of vmem_map be aligned to MAX_ORDER_NR_PAGES. This was boot tested for: FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and SPARSEMEM. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] sparse cleanups	Keith Owens	2006-08-02	1	-3/+3
\| \| \| \| \| \| \| \| \|	Fix some sparse warnings on ia64. Large constants that should be long instead of int. Use NULL instead of 0. Add some missing __iomem casts. Replace a non-C99 structure assignment. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[PATCH] Fix copying of pgdat array on each node for ia64 memory hotplug	Yasunori Goto	2006-07-04	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found a bug in memory hot-add code for ia64. IA64's code has copies of pgdat's array on each node to reduce memory access over crossing node. This array is used by NODE_DATA() macro. When new node is hot-added, this pgdat's array should be updated and copied on new node too. However, I used for_each_online_node() in scatter_node_data() to copy it. This meant its array is not copied on new node. Because initialization of structures for new node was halfway, so online_node_map couldn't be set at this time. To copy arrays on new node, I changed it to check value of pgdat_list[] which is source array of copies. I tested this patch with my Memory Hotadd emulation on Tiger4. This patch is for 2.6.17-git20. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Remove obsolete #include <linux/config.h>	Jörn Engel	2006-06-30	6	-6/+0
\| \| \| \| \|	Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
*	[PATCH] pgdat allocation and update for ia64 of memory hotplug: allocate ↵	Yasunori Goto	2006-06-27	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \|	pgdat and per node data This is a patch to allocate pgdat and per node data area for ia64. The size for them can be calculated by compute_pernodesize(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] pgdat allocation and update for ia64 of memory hotplug: update pgdat ↵	Yasunori Goto	2006-06-27	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	address array This is to refresh node_data[] array for ia64. As I mentioned previous patches, ia64 has copies of information of pgdat address array on each node as per node data. At v2 of node_add, this function used stop_machine_run() to update them. (I wished that they were copied safety as much as possible.) But, in this patch, this arrays are just copied simply, and set node_online_map bit after completion of pgdat initialization. So, kernel must touch NODE_DATA() macro after checking node_online_map(). (Current code has already done it.) This is more simple way for just hot-add..... Note : It will be problem when hot-remove will occur, because, even if online_map bit is set, kernel may touch NODE_DATA() due to race condition. :-( Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] pgdat allocation and update for ia64 of memory hotplug: hold pgdat ↵	Yasunori Goto	2006-06-27	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	address at system running This is a preparatory patch to make common code for updating of NODE_DATA() of ia64 between boottime and hotplug. Current code remembers pgdat address in mem_data which is used at just boot time. But its information can be used at hotplug time by moving to global value. The next patch uses this array. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] pgdat allocation for new node add (specify node id)	Yasunori Goto	2006-06-27	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the name of old add_memory() to arch_add_memory. And use node id to get pgdat for the node at NODE_DATA(). Note: Powerpc's old add_memory() is defined as __devinit. However, add_memory() is usually called only after bootup. I suppose it may be redundant. But, I'm not well known about powerpc. So, I keep it. (But, __meminit is better at least.) Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] Notify page fault call chain for ia64	Anil S Keshavamurthy	2006-06-26	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Overloading of page fault notification with the notify_die() has performance issues(since the only interested components for page fault is kprobes and/or kdb) and hence this patch introduces the new notifier call chain exclusively for page fault notifications their by avoiding notifying unnecessary components in the do_page_fault() code path. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Auto-update from upstream	Tony Luck	2006-06-23	1	-0/+2
\|\
\| *	ACPI add ia64 exports to build acpi_memhotplug as a module	KAMEZAWA Hiroyuki	2006-05-15	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com>
* \|	[IA64] rework memory attribute aliasing	Bjorn Helgaas	2006-05-08	1	-5/+22
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This closes a couple holes in our attribute aliasing avoidance scheme: - The current kernel fails mmaps of some /dev/mem MMIO regions because they don't appear in the EFI memory map. This keeps X from working on the Intel Tiger box. - The current kernel allows UC mmap of the 0-1MB region of /sys/.../legacy_mem even when the chipset doesn't support UC access. This causes an MCA when starting X on HP rx7620 and rx8620 boxes in the default configuration. There's more detail in the Documentation/ia64/aliasing.txt file this adds, but the general idea is that if a region might be covered by a granule-sized kernel identity mapping, any access via /dev/mem or mmap must use the same attribute as the identity mapping. Otherwise, we fall back to using an attribute that is supported according to the EFI memory map, or to using UC if the EFI memory map doesn't mention the region. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] Make show_mem() skip holes in a pgdat	Robin Holt	2006-04-13	1	-1/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies ia64's show_mem() to walk the vmem_map page tables and rapidly skip forward across regions where the page tables are missing. This prevents the pfn_valid() check from causing numerous unnecessary page faults. Without this patch on a 512 node 512 cpu system where every node has four memory holes, the show_mem() call takes 1 hour 18 minutes. With this patch, it takes less than 3 seconds. Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] Prefetch mmap_sem in ia64_do_page_fault()	Christoph Lameter	2006-04-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take a hint from an x86_64 optimization by Arjan van de Ven and use it for ia64. See a9ba9a3b3897561d01e04cd21433746df46548c0 Prefetch the mmap_sem, which is critical for the performance of the page fault handler. Note: mm may be NULL but I guess that is safe. See 458f935527372499b714bf4f8e646a68bb0f52e3 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	Merge branch 'release' of ↵	Linus Torvalds	2006-03-30	3	-9/+17
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] ioremap() should prefer WB over UC [IA64] Add __mca_table to the DISCARD list in gate.lds [IA64] Move __mca_table out of the __init section [IA64] simplify some condition checks in iosapic_check_gsi_range [IA64] correct some messages and fixes some minor things [IA64-SGI] fix for-loop in sn_hwperf_geoid_to_cnode() [IA64-SGI] sn_hwperf use of num_online_cpus() [IA64] optimize flush_tlb_range on large numa box [IA64] lazy_mmu_prot_update needs to be aware of huge pages
\| *	[IA64] ioremap() should prefer WB over UC	Bjorn Helgaas	2006-03-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	efi_memmap_init() collects full granules of WB memory, without regard for whether they also support UC. So in order for ioremap() to work for main memory, it must prefer WB mappings when possible. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	[IA64] optimize flush_tlb_range on large numa box	Chen, Kenneth W	2006-03-27	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was reported from a field customer that global spin lock ptcg_lock is giving a lot of grief on munmap performance running on a large numa machine. What appears to be a problem coming from flush_tlb_range(), which currently unconditionally calls platform_global_tlb_purge(). For some of the numa machines in existence today, this function is mapped into ia64_global_tlb_purge(), which holds ptcg_lock spin lock while executing ptc.ga instruction. Here is a patch that attempt to avoid global tlb purge whenever possible. It will use local tlb purge as much as possible. Though the conditions to use local tlb purge is pretty restrictive. One of the side effect of having flush tlb range instruction on ia64 is that kernel don't get a chance to clear out cpu_vm_mask. On ia64, this mask is sticky and it will accumulate if process bounces around. Thus diminishing the possible use of ptc.l. Thoughts? Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Jack Steiner <steiner@sgi.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	[IA64] lazy_mmu_prot_update needs to be aware of huge pages	Zhang, Yanmin	2006-03-27	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Function lazy_mmu_prot_update is also used on huge pages when it is called by set_huge_ptep_writable, but it isn't aware of huge pages. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Acked-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[PATCH] for_each_online_pgdat: remove sorting pgdat	KAMEZAWA Hiroyuki	2006-03-27	1	-31/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because pgdat_list was linked to pgdat_list in reverse order, (By default) some of arch has to sort it by themselves. for_each_pgdat has gone..for_each_online_pgdat() uses node_online_map, which doesn't need to be sorted. This patch removes codes for sorting pgdat. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* \|	[PATCH] for_each_online_pgdat: renaming for_each_pgdat	KAMEZAWA Hiroyuki	2006-03-27	2	-3/+3
\|/ \| \| \| \| \| \| \|	Replace for_each_pgdat() with for_each_online_pgdat(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] ia64: ioremap: check EFI for valid memory attributes	Bjorn Helgaas	2006-03-26	2	-1/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check the EFI memory map so we can use the correct memory attributes for ioremap(). Previously, we always used uncacheable access, which blows up on some machines for regular system memory. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Cc: Andi Kleen <ak@muc.de> Acked-by: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[IA64] add init declaration - nolwsys	Chen, Kenneth W	2006-03-22	1	-1/+1
\| \| \| \| \| \| \|	Add __initdata to nolwsys. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] add init declaration - gate page functions	Chen, Kenneth W	2006-03-22	1	-1/+1
\| \| \| \| \| \| \| \|	Add init declaration to bunch of patch functions and gate page setup function. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] add init declaration to memory initialization functions	Chen, Kenneth W	2006-03-22	2	-9/+9
\| \| \| \| \| \| \| \| \|	Add init declaration to variables/functions used for memory initialization. I don't think they would clash with memory hotplug. If they do, please yell. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] add init declaration to cpu initialization functions	Chen, Kenneth W	2006-03-22	2	-2/+2
\| \| \| \| \| \| \|	Add init declaration to cpu initialization functions. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] fix ia64 is_hugepage_only_range	Chen, Kenneth W	2006-03-22	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \|	fix is_hugepage_only_range() definition to be "overlaps" instead of "within architectural restricted hugetlb address range". Simplify the ia64 specific code that used to use is_hugepage_only_range() to just check which region the address is in. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[PATCH] hugepage: is_aligned_hugepage_range() cleanup	David Gibson	2006-03-22	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Quite a long time back, prepare_hugepage_range() replaced is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to verify if an address range is suitable for a hugepage mapping. is_aligned_hugepage_range() stuck around, but only to implement prepare_hugepage_range() on archs which didn't implement their own. Most archs (everything except ia64 and powerpc) used the same implementation of is_aligned_hugepage_range(). On powerpc, which implements its own prepare_hugepage_range(), the custom version was never used. In addition, "is_aligned_hugepage_range()" was a bad name, because it suggests it returns true iff the given range is a good hugepage range, whereas in fact it returns 0-or-error (so the sense is reversed). This patch cleans up by abolishing is_aligned_hugepage_range(). Instead prepare_hugepage_range() is defined directly. Most archs use the default version, which simply checks the given region is aligned to the size of a hugepage. ia64 and powerpc define custom versions. The ia64 one simply checks that the range is in the correct address space region in addition to being suitably aligned. The powerpc version (just as previously) checks for suitable addresses, and if necessary performs low-level MMU frobbing to set up new areas for use by hugepages. No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] remove set_page_count() outside mm/	Nick Piggin	2006-03-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	set_page_count usage outside mm/ is limited to setting the refcount to 1. Remove set_page_count from outside mm/, and replace those users with init_page_count() and set_page_refcounted(). This allows more debug checking, and tighter control on how code is allowed to play around with page->_count. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Pull bsp-removal into release branch	Tony Luck	2006-03-21	2	-3/+10
\|\
\| *	[IA64] support for cpu0 removal	Ashok Raj	2006-01-05	2	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	here is the BSP removal support for IA64. Its pretty much the same thing that was released a while back, but has your feedback incorporated. - Removed CONFIG_BSP_REMOVE_WORKAROUND and associated cmdline param - Fixed compile issue with sn2/zx1 due to a undefined fix_b0_for_bsp - some formatting nits (whitespace etc) This has been tested on tiger and long back by alex on hp systems as well. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[IA64] Simple memory hot-add for ia64.	Yasunori Goto	2006-01-16	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First step to memory hotplug for ia64 (add only, all new memory is added to node 0, does not use ZONE_EASY_RECLAIM yet). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[IA64] Hole in IA64 TLB flushing from system threads	Jack Steiner	2006-01-13	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I originally thought this was an bug only in the SN code, but I think I also see a hole in the generic IA64 tlb code. (Separate patch was sent for the SN problem). It looks like there is a bug in the TLB flushing code. During context switch, kernel threads (kswapd, for example) inherit the mm of the task that was previously running on the cpu. Normally, this is ok because the previous context is still loaded into the RR registers. However, if the owner of the mm migrates to another cpu, changes it's context number, and references a page before kswapd issues a tlb_purge for that same page, the purge will be done with a stale context number (& RR registers). Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[IA64] Limit the maximum NODEDATA_ALIGN() offset	Jack Steiner	2005-12-06	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The per-node data structures are allocated with strided offsets that are a function of the node number. This prevents excessive cache-aliasing from occurring. On systems with a large number of nodes, the strided offset becomes too large. This patch restricts the maximum offset to 32MB. This is far larger than the size of any current L3 cache. Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	Pull context-bitmap into release branch	Tony Luck	2005-11-10	1	-42/+43
\|\
\| *	[IA64] make mmu_context.h and tlb.c 80-column friendly	Chen, Kenneth W	2005-11-03	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	wrap_mmu_context(), delayed_tlb_flush(), get_mmu_context() all have an extra { } block which cause one extra indentation. get_mmu_context() is particularly bad with 5 indentations to the most inner "if". It finally gets on my nerve that I can't keep the code within 80 columns. Remove the extra { } block and while I'm at it, reformat all the comments to 80-column friendly. No functional change at all with this patch. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| *	[IA64] Use bitmaps for efficient context allocation/free	Peter Keilty	2005-10-31	1	-29/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Corrects the very inefficent method of finding free context_ids in get_mmu_context(). Instead of walking the task_list of all processes, 2 bitmaps are used to efficently store and lookup state, inuse and needs flushing. The entire rid address space is now used before calling wrap_mmu_context and global tlb flushing. Special thanks to Ken and Rohit for their review and modifications in using a bit flushmap. Signed-off-by: Peter Keilty <peter.keilty@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[IA64] fix memory less node allocation	Bob Picco	2005-11-08	1	-10/+9
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original memory less node allocation attempted to use NODEDATA_ALIGN for alignment. The bootmem allocator only allows a power of two alignments. This causes a BUG_ON for some nodes. For cpu only nodes just allocate with a PERCPU_PAGE_SIZE alignment. Some older firmware reports SLIT distances of 0xff and results in bestnode not being computed. This is now treated correctly. The failed allocation check was removed because it's redundant. The bootmem allocator already makes this check. This fix has been boot tested on 4 node machine which has 4 cpu only nodes and 1 memory node. Thanks to Pete Keilty for reporting this and helping me test it. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
*	[PATCH] memory hotplug locking: node_size_lock	Dave Hansen	2005-10-29	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pgdat->node_size_lock is basically only neeeded in one place in the normal code: show_mem(), which is the arch-specific sysrq-m printing function. Strictly speaking, the architectures not doing memory hotplug do no need this locking in show_mem(). However, they are all included for completeness. This should also make any future consolidation of all of the implementations a little more straightforward. This lock is also held in the sparsemem code during a memory removal, as sections are invalidated. This is the place there pfn_valid() is made false for a memory area that's being removed. The lock is only required when doing pfn_valid() operations on memory which the user does not already have a reference on the page, such as in show_mem(). Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: flush_tlb_range outside ptlock	Hugh Dickins	2005-10-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was one small but very significant change in the previous patch: mprotect's flush_tlb_range fell outside the page_table_lock: as it is in 2.4, but that doesn't prove it safe in 2.6. On some architectures flush_tlb_range comes to the same as flush_tlb_mm, which has always been called from outside page_table_lock in dup_mmap, and is so proved safe. Others required a deeper audit: I could find no reliance on page_table_lock in any; but in ia64 and parisc found some code which looks a bit as if it might want preemption disabled. That won't do any actual harm, so pending a decision from the maintainers, disable preemption there. Remove comments on page_table_lock from flush_tlb_mm, flush_tlb_range and flush_tlb_page entries in cachetlb.txt: they were rather misleading (what generic code does is different from what usually happens), the rules are now changing, and it's not yet clear where we'll end up (will the generic tlb_flush_mmu happen always under lock? never under lock? or sometimes under and sometimes not?). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: init_mm without ptlock	Hugh Dickins	2005-10-29	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First step in pushing down the page_table_lock. init_mm.page_table_lock has been used throughout the architectures (usually for ioremap): not to serialize kernel address space allocation (that's usually vmlist_lock), but because pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it. Reverse that: don't lock or unlock init_mm.page_table_lock in any of the architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take and drop it when allocating a new one, to check lest a racing task already did. Similarly no page_table_lock in vmalloc's map_vm_area. Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle user mms, which are converted only by a later patch, for now they have to lock differently according to whether or not it's init_mm. If sources get muddled, there's a danger that an arch source taking init_mm.page_table_lock will be mixed with common source also taking it (or neither take it). So break the rules and make another change, which should break the build for such a mismatch: remove the redundant mm arg from pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13). Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64 used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64 map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free took page_table_lock for no good reason. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: ia64 use expand_upwards	Hugh Dickins	2005-10-29	2	-28/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ia64 has expand_backing_store function for growing its Register Backing Store vma upwards. But more complete code for this purpose is found in the CONFIG_STACK_GROWSUP part of mm/mmap.c. Uglify its #ifdefs further to provide expand_upwards for ia64 as well as expand_stack for parisc. The Register Backing Store vma should be marked VM_ACCOUNT. Implement the intention of growing it only a page at a time, instead of passing an address outside of the vma to handle_mm_fault, with unknown consequences. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[PATCH] mm: vm_stat_account unshackled	Hugh Dickins	2005-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The original vm_stat_account has fallen into disuse, with only one user, and only one user of vm_stat_unaccount. It's easier to keep track if we convert them all to __vm_stat_account, then free it from its __shackles. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	Pull fix-slow-tlb-purge into release branch	Tony Luck	2005-10-28	1	-7/+9
\|\
\| *	[IA64] - Avoid slow TLB purges on SGI Altix systems	Dean Roe	2005-10-27	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flush_tlb_all() can be a scaling issue on large SGI Altix systems since it uses the global call_lock and always executes on all cpus. When a process enters flush_tlb_range() to purge TLBs for another process, it is possible to avoid flush_tlb_all() and instead allow sn2_global_tlb_purge() to purge TLBs only where necessary. This patch modifies flush_tlb_range() so that this case can be handled by platform TLB purge functions and updates ia64_global_tlb_purge() accordingly. sn2_global_tlb_purge() now calculates the region register value from the mm argument introduced with this patch. Signed-off-by: Dean Roe <roe@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	Pull for-each-cpu into release branch	Tony Luck	2005-10-28	1	-2/+3
\|\ \
\| * \|	[IA64] wider use of for_each_cpu_mask() in arch/ia64	hawkes@sgi.com	2005-10-25	1	-2/+3
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In arch/ia64 change the explicit use of for-loops and NR_CPUS into the general for_each_cpu() or for_each_online_cpu() constructs, as appropriate. This widens the scope of potential future optimizations of the general constructs, as well as takes advantage of the existing optimizations of first_cpu() and next_cpu(). Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[PATCH] V5 ia64 SPARSEMEM - SPARSEMEM code changes	Bob Picco	2005-10-04	3	-3/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is the minimal set of changes required by ia64 to use SPARSEMEM. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[PATCH] V5 ia64 SPARSEMEM - eliminate contig_page_data	Bob Picco	2005-10-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For FLATMEM contig_page_data has been made transparent to the arch code. This patch conforms to that change. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* \|	[PATCH] V5 ia64 SPARSEMEM - Kconfig and Makefile	Bob Picco	2005-10-04	1	-3/+2
\|/ \| \| \| \| \| \| \| \|	The patch modifies the Kconfig file to introduce the new memory model options and other related SPARSEMEM changes. There is also a minor change in the Makefile. Signed-off-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>