op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	swap: add per-partition lock for swapfile	Shaohua Li	2013-02-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. [arnd@arndb.de: fix it for nommu] [hughd@google.com: add missing unlock] [minchan@kernel.org: get rid of lockdep whinge on sys_swapon] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	memory-hotplug: remove memmap of sparse-vmemmap	Tang Chen	2013-02-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap	Yasuaki Ishimatsu	2013-02-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'x86-mm-for-linus' of ↵	Linus Torvalds	2013-02-21	1	-13/+11
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...
\| *	Merge remote-tracking branch 'origin/x86/boot' into x86/mm2	H. Peter Anvin	2013-01-29	2	-98/+34
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
\| * \|	sparc, mm: Remove calling of free_all_bootmem_node()	Yinghai Lu	2012-11-17	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove calling of free_all_bootmem_node() We add register_page_bootmem_info() to call register_page_bootmem_info_node directly. Also could use free_all_bootmem() for numa case, and it is just the same as free_low_memory_core_early(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-45-git-send-email-yinghai@kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: sparclinux@vger.kernel.org Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* \| \|	sparc64: Fix tsb_grow() in atomic context.	David S. Miller	2013-02-20	3	-9/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If our first THP installation for an MM is via the set_pmd_at() done during khugepaged's collapsing we'll end up in tsb_grow() trying to do a GFP_KERNEL allocation with several locks held. Simply using GFP_ATOMIC in this situation is not the best option because we really can't have this fail, so we'd really like to keep this an order 0 GFP_KERNEL allocation if possible. Also, doing the TSB allocation from khugepaged is a really bad idea because we'll allocate it potentially from the wrong NUMA node in that context. So what we do is defer the hugepage TSB allocation until the first TLB miss we take on a hugepage. This is slightly tricky because we have to handle two unusual cases: 1) Taking the first hugepage TLB miss in the window trap handler. We'll call the winfix_trampoline when that is detected. 2) An initial TSB allocation via TLB miss races with a hugetlb fault on another cpu running the same MM. We handle this by unconditionally loading the TSB we see into the current cpu even if it's non-NULL at hugetlb_setup time. Reported-by: Meelis Roos <mroos@ut.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	sparc64: Handle hugepage TSB being NULL.	David S. Miller	2013-02-20	1	-16/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Accomodate the possibility that the TSB might be NULL at the point that update_mmu_cache() is invoked. This is necessary because we will sometimes need to defer the TSB allocation to the first fault that happens in the 'mm'. Seperate out the hugepage PTE test into a seperate function so that the logic is clearer. Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	sparc64: Fix gfp_flags setting in tsb_grow().	David S. Miller	2013-02-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should "\|= more_flags" rather than "= more_flags". Reported-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	sparc64: Fix get_user_pages_fast() wrt. THP.	David S. Miller	2013-02-13	1	-2/+57
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly mirrors the s390 logic, as unlike x86 we don't need the SetPageReferenced() bits. On sparc64 we also lack a user/privileged bit in the huge PMDs. In order to make this work for THP and non-THP builds, some header file adjustments were necessary. Namely, provide the PMD_HUGE_* bit defines and the pmd_large() inline unconditionally rather than protected by TRANSPARENT_HUGEPAGE. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	SPARC: drivers: remove __dev* attributes.	Greg Kroah-Hartman	2013-01-03	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2012-12-12	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/.c - struct pt_regs is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild
\| * \	Merge branch 'arch-frv' into no-rebases	Al Viro	2012-11-16	1	-2/+62
\| \|\ \ \| \| \|/
\| * \|	sparc64: clear syscall_noerror on the entry to syscall, not on the exit	Al Viro	2012-10-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move that sucker to just before TI_FPDEPTH and replace stb with sth in etrap_save(). Take current_ds to its old place, so that we don't push wsaved into TI_... flags. That allows to lose clearing syscall_noerror on return from syscall. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \|	mm: use vm_unmapped_area() in hugetlbfs on sparc64 architecture	Michel Lespinasse	2012-12-11	1	-94/+30
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the sparc64 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	sparc64: Add global PMU register dumping via sysrq.	David S. Miller	2012-10-16	1	-2/+62
\|/ \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Fix deficiencies in sun4v error reporting.	David S. Miller	2012-10-10	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Missing error types, attributes, and report fields. Pad out to 64-bytes. Make string reporting cleaner and easier to extend in the future using "const char *" arrays that index by either bit position, or absolute field value. Report the raw 64-byte error report as a sequence of u64s before the annotated version. Only report fields which are valid, given the context and the attribute bits which are set. For shutdown requests, use the local copy of the error report not the one we just freed up back to the queue. Also, use orderly_poweroff() just like the Domain Services shutdown request code does. If the real-address reported is "-1" (unknown) try to disassemble the instruction to report the effective address of the access. Only do this in privileged mode. Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Support transparent huge pages.	David Miller	2012-10-09	5	-84/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sparc64: Eliminate PTE table memory wastage.	David Miller	2012-10-09	2	-0/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've split up the PTE tables so that they take up half a page instead of a full page. This is in order to facilitate transparent huge page support, which works much better if our PMDs cover 4MB instead of 8MB. What we do is have a one-behind cache for PTE table allocations in the mm struct. This logic triggers only on allocations. For example, we don't try to keep track of free'd up page table blocks in the style that the s390 port does. There were only two slightly annoying aspects to this change: 1) Changing pgtable_t to be a "pte_t *". There's all of this special logic in the TLB free paths that needed adjustments, as did the PMD populate interfaces. 2) init_new_context() needs to zap the pointer, since the mm struct just gets copied from the parent on fork. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sparc64: Only support 4MB huge pages and 8KB base pages.	David Miller	2012-10-09	2	-24/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	readahead: fault retry breaks mmap file read random detection	Shaohua Li	2012-10-09	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sparc: fix format string argument for prom_printf()	Akinobu Mita	2012-10-02	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	prom_printf() takes printf style arguments. Specifing GCC's format attribute reveals that there are several wrong usages of prom_printf(). This fixes those wrong format strings and arguments, and also leaves format attributes in order to detect similar mistakes at compile time. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Use cpu_pgsz_mask for linear kernel mapping config.	David S. Miller	2012-09-06	1	-39/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This required a little bit of reordering of how we set up the memory management early on. We now only know the final values of kern_linear_pte_xor[] after we take over the trap table and start processing TLB misses ourselves. So once we fill those values in we re-clear the kernel's 4M TSB and flush the TLBs. That way if we find we support larger than 4M pages we won't have any stale smaller page size entries in the TSB. SUN4U Panther support for larger page sizes should now be extremely trivial but I have no hardware on which to test it and I believe that some of the sun4u TLB miss assembler needs to be audited first to make sure it really can handle larger than 4M PTEs properly. Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Probe cpu page size support more portably.	David S. Miller	2012-09-06	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \|	On sun4v, interrogate the machine description. This code is extremely defensive in nature, and a lot of the checks can probably be removed. On sun4u things are a lot simpler. There are the page sizes all chips support, and then Panther adds 32MB and 256MB pages. Report the probed value in /proc/cpuinfo Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Support 2GB and 16GB page sizes for kernel linear mappings.	David S. Miller	2012-09-06	2	-29/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARC-T4 supports 2GB pages. So convert kpte_linear_bitmap into an array of 2-bit values which index into kern_linear_pte_xor. Now kern_linear_pte_xor is used for 4 page size aligned regions, 4MB, 256MB, 2GB, and 16GB respectively. Enabling 2GB pages is currently hardcoded using a check against sun4v_chip_type. In the future this will be done more cleanly by interrogating the machine description which is the correct way to determine this kind of thing. Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc64: Be less verbose during vmemmap population.	David S. Miller	2012-08-15	1	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a 2-node machine with 256GB of ram we get 512 lines of console output, which is just too much. This mimicks Yinghai Lu's x86 commit c2b91e2eec9678dbda274e906cc32ea8f711da3b (x86_64/mm: check and print vmemmap allocation continuous) except that we aren't ever going to get contiguous block pointers in between calls so just print when the virtual address or node changes. This decreases the output by an order of 16. Also demote this to KERN_DEBUG. Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: delete dead code in show_mem()	Sam Ravnborg	2012-07-26	1	-7/+0
\| \| \| \| \|	Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: move kmap_init() to highmem.c	Sam Ravnborg	2012-07-26	2	-13/+17
\| \| \| \| \| \| \|	Try to keep highmem support in a more central place. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: move probe_memory() to srmmu.c	Sam Ravnborg	2012-07-26	2	-13/+11
\| \| \| \| \| \| \| \|	Only one user so move it to the file using it. It had nothing to do in fault_32. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: centralize all mmu context handling in srmmu.c	Sam Ravnborg	2012-07-26	3	-33/+60
\| \| \| \| \|	Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: drop quicklist	Sam Ravnborg	2012-07-26	1	-2/+0
\| \| \| \| \| \| \|	The quicklist stuff is not used anymore - drop it. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: drop sparc model check in paging_init	Sam Ravnborg	2012-07-26	1	-13/+1
\| \| \| \| \| \| \| \|	We already check the model in head_32.S so no need to repeat the check here Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: drop sparc_unmapped_base	Sam Ravnborg	2012-07-26	1	-2/+0
\| \| \| \| \| \| \|	The base is always the same so no need to use a variable for this. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32,leon: drop leon_init()	Sam Ravnborg	2012-07-26	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	This function was only used to set of_pdt_build_more to leon_node_init(). But the leon_node_init() was a nop as prom_amba_init was never assigned. Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: drop fixmap.h	Sam Ravnborg	2012-07-26	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	sparc32 does not support fixmaps - so do not pretend so by having the fixmap.h file. Move relevant parts to vaddrs.h. I looked at simplifying this even more but failed to understand the reasoning behind the extra guard page involved and due to missing testing possibilities only the trivial conversion was done. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: drop unused kmap_atomic_to_page	Sam Ravnborg	2012-07-26	1	-18/+0
\| \| \| \| \| \| \|	No users left of this function - drop it. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: beautify srmmu_inherit_prom_mappings()	Sam Ravnborg	2012-07-26	1	-10/+16
\| \| \| \| \|	Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: use void * in nocache get/free	Sam Ravnborg	2012-07-26	1	-27/+34
\| \| \| \| \| \| \| \|	This allowed to us to kill a lot of casts, with no loss of readability in any places Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: fix coding-style in srmmu.c	Sam Ravnborg	2012-07-26	1	-68/+64
\| \| \| \| \| \| \| \| \| \| \|	Fix the most annoying issues that distracts me: - whitespace - missing space after "if" and "while" - spaces around operators and similar simple things. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: sort includes in srmmu.c	Sam Ravnborg	2012-07-26	1	-21/+21
\| \| \| \| \|	Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: define a few srmmu functions __init	Sam Ravnborg	2012-07-26	1	-2/+2
\| \| \| \| \| \| \| \|	They are only used during early init so lets get rid of them after init to save some RAM. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sparc32: srmmu_probe now knows about leon too	Sam Ravnborg	2012-05-27	2	-5/+22
\| \| \| \| \| \|	Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>
*	sparc32: introduce run-time patching of srmmu access functions	Sam Ravnborg	2012-05-27	2	-0/+83
\| \| \| \| \| \| \| \|	LEON uses a different ASI than SUN for MMUREGS Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>
*	sparc32,leon: always include leon_smp + leon_mm in build	Sam Ravnborg	2012-05-27	1	-1/+1
\| \| \| \| \| \| \| \|	Fix-up leon specific assembler to use ASI_LEON_MMUREGS Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>
*	sparc32: use the common implementation of alloc_thread_info_node()	Sam Ravnborg	2012-05-22	1	-27/+0
\| \| \| \| \| \| \| \| \|	With sun4c removed we can fall-back to the common implementation. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next	Linus Torvalds	2012-05-21	14	-3917/+523
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull sparc updates from David Miller: 1) Kill off support for sun4c and Cypress sun4m chips. And as a result we were able to also kill off that ugly btfixup thing that required multi-stage links of the final vmlinux image in the Kbuild system. This should make the kbuild maintainers really happy. Thanks a lot to Sam Ravnborg for his tireless efforts to get this going. 2) Convert sparc64 to nobootmem. I suspect now with sparc32 being a lot cleaner, it should be able to fall in line and modernize in this area too. 3) Make sparc32 use generic clockevents, from Tkhai Kirill. [ I fixed up the BPF rules, and tried to clean up the build rules too. But I don't have - or want - a sparc cross-build environment, so the BPF rule bug and the related build cleanup was all done with just a bare "make -n" pseudo-test. - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (110 commits) sparc32: use flushi when run-time patching in per_cpu_patch sparc32: fix cpuid_patch run-time patching sparc32: drop unused inline functions in srmmu.c sparc32: drop unused functions in pgtsrmmu.h sparc32,leon: move leon mmu functions to leon_mm.c sparc32,leon: remove duplicate definitions in leon.h sparc32,leon: remove duplicate UART register definitions sparc32,leon: move leon ASI definitions to asi.h sparc32: move trap table to a separate file sparc64: renamed ttable.S to ttable_64.S sparc32: Remove asm/sysen.h header. sparc32: Delete asm/smpprim.h sparc32: Remove unused empty_bad_page{,_table} declarations. sparc32: Kill boot_cpu_id4 sparc32: Move GET_PROCESSOR*_ID() out of asm/asmmacro.h sparc32: Remove completely unused code from asm/cache.h sparc32: Add ucmpdi2.o to obj-y instead of lib-y. sparc32: add ucmpdi2 sparc: introduce arch/sparc/Kbuild sparc: remove obsolete documentation ...
\| *	sparc32: drop unused inline functions in srmmu.c	Sam Ravnborg	2012-05-19	1	-26/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When decelared inline the compiler does not warn about unused functions. But they are not used so drop them. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sparc32: drop unused functions in pgtsrmmu.h	Sam Ravnborg	2012-05-19	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	One function was only used by leon - move it to a leon specific file. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sparc32,leon: move leon mmu functions to leon_mm.c	Sam Ravnborg	2012-05-19	3	-81/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already have a leaon specific file - so keep all the laon stuff in one place. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Konrad Eisele <konrad@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sparc32: cleanup mm/fault_32.c	Sam Ravnborg	2012-05-15	1	-50/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- remove unused variables - fix coding style issues that hurts my eyes Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>