summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* replace nested max/min macros with {max,min}3 macroHagen Paul Pfeifer2010-10-268-12/+9
| | | | | | | | | | | | | | | | | | | Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel.h: add {min,max}3 macrosHagen Paul Pfeifer2010-10-261-0/+18
| | | | | | | | | | | | | | | | | | | | | Introduce two additional min/max macros to compare three operands. This will save some cycles as well as some bytes on the stack and last but not least more pleasing as macro nesting. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hostfs: code cleanupsRichard Weinberger2010-10-261-3/+2
| | | | | | | | | Some code cleanups for hostfs. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* um: migrate from __do_IRQ() to generic_handle_irq()Richard Weinberger2010-10-262-11/+7
| | | | | | | | | | | This patch removes __do_IRQ() from user mode linux. __do_IRQ is deprecated. Signed-off-by: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* uml: fix CONFIG_STATIC_LINK=y build failure with newer glibcRoland McGrath2010-10-262-2/+29
| | | | | | | | | | | | | | With glibc 2.11 or later that was built with --enable-multi-arch, the UML link fails with undefined references to __rel_iplt_start and similar symbols. In recent binutils, the default linker script defines these symbols (see ld --verbose). Fix the UML linker scripts to match the new defaults for these sections. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* uml: define CONFIG_NO_DMAFUJITA Tomonori2010-10-263-113/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think that it's better to detect DMA misuse at build time rather than calling BUG_ON. Architectures that can't do DMA need to define CONFIG_NO_DMA. Thanks to Sam Ravnborg for explaining how CONFIG_NO_DMA and CONFIG_HAS_DMA work: http://marc.info/?l=linux-kernel&m=128359913825550&w=2 HAS_DMA is defined like this: config HAS_DMA boolean depends on !NO_DMA default y So to set HAS_DMA to true an arch should do: 1) Do not define NO_DMA 2) Define NO_DMA abd set it to 'n' Must archs - including um - used principle 1). In the um case we want to say that we do NOT have any DMA. This can be done in two ways. a) define NO_DMA and set it to 'y' b) redefine HAS_DMA and set it to 'n'. The patch you provided used principle b) where other archs use principle a). So I suggest you should use principle a) for um too. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Jeff Dike <jdike@addtoit.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* alpha: use single HAE window on T2 core logic (gamma, sable)Ivan Kokshaysky2010-10-264-40/+30
| | | | | | | | | | | | | | | | | | | T2 are the only alpha SMP systems that do HAE switching at runtime, which is fundamentally racy on SMP. This patch limits MMIO space on T2 to HAE0 only, like we did on MCPCIA (rawhide) long ago. This leaves us with only 112 Mb of PCI MMIO (128 Mb HAE aperture minus 16 Mb reserved for EISA), but since linux PCI allocations are reasonably tight, it should be enough for sane hardware configurations. Also, fix a typo in MCPCIA_FROB_MMIO macro which shouldn't call set_hae() if MCPCIA_ONE_HAE_WINDOW is defined. It's more for correctness, as set_hae() is a no-op anyway in that case. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* alpha: enable ARCH_DMA_ADDR_T_64BITFUJITA Tomonori2010-10-261-0/+3
| | | | | | | | | Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/char/hpet.c: fix information leak to userlandVasiliy Kulikov2010-10-261-2/+1
| | | | | | | | | | | Structure info is copied to userland with some padding fields unitialized. It leads to leaking of stack memory. [akpm@linux-foundation.org: remove now-unneeded zeroing of info->hi_ireqfreq] Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Cc: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Documentation/timers/hpet_example.c: add supporting info for hpet_exampleJaswinder Singh Rajput2010-10-261-0/+27
| | | | | | | | | | | | | | $./hpet_example info /dev/hpet -hpet: executing info hpet_info: hi_irqfreq 0x0 hi_flags 0x0 hi_hpet 0 hi_timer 2 Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: "Venkatesh Pallipadi (Venki)" <venki@google.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hpet: fix style problemsJaswinder Singh Rajput2010-10-261-10/+10
| | | | | | | | | | | | | | Fix the following style problems: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> WARNING: Use #include <linux/io.h> instead of <asm/io.h> ERROR: code indent should use tabs where possible ERROR: do not initialise statics to 0 or NULL Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hpet: fix unwanted interrupt due to stale irq status bitClemens Ladisch2010-10-261-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jaswinder Singh Rajput wrote: > By executing Documentation/timers/hpet_example.c > > for polling, I requested for 3 iterations but it seems iteration work > for only 2 as first expired time is always very small. > > # ./hpet_example poll /dev/hpet 10 3 > -hpet: executing poll > hpet_poll: info.hi_flags 0x0 > hpet_poll: expired time = 0x13 > hpet_poll: revents = 0x1 > hpet_poll: data 0x1 > hpet_poll: expired time = 0x1868c > hpet_poll: revents = 0x1 > hpet_poll: data 0x1 > hpet_poll: expired time = 0x18645 > hpet_poll: revents = 0x1 > hpet_poll: data 0x1 Clearing the HPET interrupt enable bit disables interrupt generation but does not disable the timer, so the interrupt status bit will still be set when the timer elapses. If another interrupt arrives before the timer has been correctly programmed (due to some other device on the same interrupt line, or CONFIG_DEBUG_SHIRQ), this results in an extra unwanted interrupt event because the status bit is likely to be set from comparator matches that happened before the device was opened. Therefore, we have to ensure that the interrupt status bit is and stays cleared until we actually program the timer. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Reported-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Bob Picco <bpicco@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hpet: unmap unused I/O spaceJiri Slaby2010-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the initialization code in hpet finds a memory resource and does not find an IRQ, it does not unmap the memory resource previously mapped. There are buggy BIOSes which report resources exactly like this and what is worse the memory region bases point to normal RAM. This normally would not matter since the space is not touched. But when PAT is turned on, ioremap causes the page to be uncached and sets this bit in page->flags. Then when the page is about to be used by the allocator, it is reported as: BUG: Bad page state in process md5sum pfn:3ed00 page:ffffea0000dbd800 count:0 mapcount:0 mapping:(null) index:0x0 page flags: 0x20000001000000(uncached) Pid: 7956, comm: md5sum Not tainted 2.6.34-12-desktop #1 Call Trace: [<ffffffff810df851>] bad_page+0xb1/0x100 [<ffffffff810dfa45>] prep_new_page+0x1a5/0x1c0 [<ffffffff810dfe01>] get_page_from_freelist+0x3a1/0x640 [<ffffffff810e01af>] __alloc_pages_nodemask+0x10f/0x6b0 ... In this particular case: 1) HPET returns 3ed00000 as memory region base, but it is not in reserved ranges reported by the BIOS (excerpt): BIOS-e820: 0000000000100000 - 00000000af6cf000 (usable) BIOS-e820: 00000000af6cf000 - 00000000afdcf000 (reserved) 2) there is no IRQ resource reported by HPET method. On the other hand, the Intel HPET specs (1.0a) says (3.2.5.1): _CRS ( // Report 1K of memory consumed by this Timer Block memory range consumed // Optional: only used if BIOS allocates Interrupts [1] IRQs consumed ) [1] For case where Timer Block is configured to consume IRQ0/IRQ8 AND Legacy 8254/Legacy RTC hardware still exists, the device objects associated with 8254 & RTC devices should not report IRQ0/IRQ8 as "consumed resources". So in theory we should check whether if it is the case and use those interrupts instead. Anyway the address reported by the BIOS here is bogus, so non-presence of IRQ doesn't mean the "optional" part in point 2). Since I got no reply previously, fix this by simply unmapping the space when IRQ is not found and memory region was mapped previously. It would be probably more safe to walk the resources again and unmap appropriately depending on type. But as we now use only ioremap for both 2 memory resource types, it is not necessarily needed right now. Addresses https://bugzilla.novell.com/show_bug.cgi?id=629908 Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: do_migrate_range: reduce list_empty() checkBob Liu2010-10-261-12/+9
| | | | | | | | | | | | Simple code for reducing list_empty(&source) check. Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: do_migrate_range: exit loop if not_managed is trueBob Liu2010-10-261-4/+6
| | | | | | | | | | | | | If not_managed is true all pages will be putback to lru, so break the loop earlier to skip other pages isolate. Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: page_isolation: codeclean fix comment and rm unneeded val initBob Liu2010-10-261-2/+1
| | | | | | | | | | | | | | __test_page_isolated_in_pageblock() returns 1 if all pages in the range are isolated, so fix the comment. Variable `pfn' will be initialised in the following loop so remove it. Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix is_mem_section_removable() page_order BUG_ON checkKAMEZAWA Hiroyuki2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | page_order() is called by memory hotplug's user interface to check the section is removable or not. (is_mem_section_removable()) It calls page_order() withoug holding zone->lock. So, even if the caller does if (PageBuddy(page)) ret = page_order(page) ... The caller may hit BUG_ON(). For fixing this, there are 2 choices. 1. add zone->lock. 2. remove BUG_ON(). is_mem_section_removable() is used for some "advice" and doesn't need to be 100% accurate. This is_removable() can be called via user program.. We don't want to take this important lock for long by user's request. So, this patch removes BUG_ON(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/hugetlb.c: add missing spin_lock() to hugetlb_cow()Dean Nelson2010-10-261-1/+4
| | | | | | | | | | | Add missing spin_lock() of the page_table_lock before an error return in hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return. Signed-off-by: Dean Nelson <dnelson@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix error reporting in move_pages() syscallGleb Natapov2010-10-261-2/+2
| | | | | | | | | | | | | | The vma returned by find_vma does not necessarily include the target address. If this happens the code tries to follow a page outside of any vma and returns ENOENT instead of EFAULT. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* /proc/swaps: support pollingKay Sievers2010-10-261-1/+48
| | | | | | | | | | | | | System management wants to subscribe to changes in swap configuration. Make /proc/swaps pollable like /proc/mounts. [akpm@linux-foundation.org: document proc_poll_event] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Greg KH <greg@kroah.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add vzalloc() and vzalloc_node() helpersDave Young2010-10-263-3/+94
| | | | | | | | | | | | | | Add vzalloc() and vzalloc_node() to encapsulate the vmalloc-then-memset-zero operation. Use __GFP_ZERO to zero fill the allocated memory. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug.c: make scan_lru_pages() staticAndrew Morton2010-10-261-1/+1
| | | | | | | Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/fs-writeback.c: restore lost commentAndrew Morton2010-10-261-0/+4
| | | | | | | | | I had to go back to a 2.6.20 tree to work out why we're adding a number-of-inodes into a number-of-pages count. Restore the lost comment. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix sparse warnings on GFP_ZONE_TABLE/BADNamhyung Kim2010-10-261-42/+63
| | | | | | | | | | | | Introduce ___GFP_* masks in order for gfp_t to not be mixed with plain integers which causes a lot of warnings like the following: warning: restricted gfp_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmstat: include compaction.h when CONFIG_COMPACTIONNamhyung Kim2010-10-261-0/+2
| | | | | | | | | | | This removes following warning from sparse: mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static? [akpm@linux-foundation.org: move the include to top-of-file] Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: declare some external symbolsNamhyung Kim2010-10-262-0/+3
| | | | | | | | | | | | Declare 'bdi_pending_list' and 'tag_pages_for_writeback()' to remove following sparse warnings: mm/backing-dev.c:46:1: warning: symbol 'bdi_pending_list' was not declared. Should it be static? mm/page-writeback.c:825:6: warning: symbol 'tag_pages_for_writeback' was not declared. Should it be static? Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmalloc: annotate lock context change on s_start/stop()Namhyung Kim2010-10-261-0/+2
| | | | | | | | | s_start() and s_stop() grab/release vmlist_lock but were missing proper annotations. Add them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmalloc: rename temporary variable in __insert_vmap_area()Namhyung Kim2010-10-261-4/+4
| | | | | | | | | | | Rename redundant 'tmp' to fix following sparse warnings: mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one mm/vmalloc.c:293:24: originally declared here Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rmap: make anon_vma_chain_free() staticNamhyung Kim2010-10-261-1/+1
| | | | | | | | | | Make anon_vma_chain_free() static. It is called only in rmap.c and the corresponding alloc function is already static. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rmap: wrap page_check_address() using __cond_lock()Namhyung Kim2010-10-262-2/+13
| | | | | | | | | | | | | | | The page_check_address() conditionally grabs *@ptlp in case of returning non-NULL. Rename and wrap it using __cond_lock() removes following warnings from sparse: mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rmap: annotate lock context change on page_[un]lock_anon_vma()Namhyung Kim2010-10-262-2/+17
| | | | | | | | | | The page_lock_anon_vma() conditionally grabs RCU and anon_vma lock but page_unlock_anon_vma() releases them unconditionally. This leads sparse to complain about context imbalance. Annotate them. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: wrap follow_pte() using __cond_lock()Namhyung Kim2010-10-261-1/+12
| | | | | | | | | | | | The follow_pte() conditionally grabs *@ptlp in case of returning 0. Rename and wrap it using __cond_lock() removes following warnings: mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add lock release annotation on do_wp_page()Namhyung Kim2010-10-261-0/+1
| | | | | | | | | | | | The do_wp_page() releases @ptl but was missing proper annotation. Add it. This removes following warnings from sparse: mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: wrap get_locked_pte() using __cond_lock()Namhyung Kim2010-10-262-2/+10
| | | | | | | | | | The get_locked_pte() conditionally grabs 'ptl' in case of returning non-NULL. This leads sparse to complain about context imbalance. Rename and wrap it using __cond_lock() to make sparse happy. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add casts to/from gfp_t in gfp_to_alloc_flags()Namhyung Kim2010-10-261-2/+2
| | | | | | | | | | This removes following warning from sparse: mm/page_alloc.c:1934:9: warning: restricted gfp_t degrades to integer Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove temporary variable on generic_file_direct_write()Namhyung Kim2010-10-261-4/+4
| | | | | | | | | | | | 'end' shadows earlier one and is not necessary at all. Remove it and use 'pos' instead. This removes following sparse warnings: mm/filemap.c:2180:24: warning: symbol 'end' shadows an earlier one mm/filemap.c:2132:25: originally declared here Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: access_error API cleanupMichel Lespinasse2010-10-261-3/+3
| | | | | | | | | | | | | | | | | access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: retry page fault when blocking on disk transferMichel Lespinasse2010-10-265-15/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [akpm@linux-foundation.org: fix warning & crash] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: filemap_fault: unique path for locking pageMichel Lespinasse2010-10-261-9/+11
| | | | | | | | | | | | | | | | | | | | Introduce a single location where filemap_fault() locks the desired page. There used to be two such places, depending if the initial find_get_page() was successful or not. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove alignment padding from anon_vma on (some) 64 bit buildsRichard Kennedy2010-10-261-1/+1
| | | | | | | | | | | | | | | | Reorder structure anon_vma to remove alignment padding on 64 builds when (CONFIG_KSM || CONFIG_MIGRATION). This will shrink the size of the anon_vma structure from 40 to 32 bytes & allow more objects per slab in its kmem_cache. Under slub the objects in the anon_vma kmem_cache will then be 40 bytes with 102 objects per slab. (On v2.6.36 without this patch,the size is 48 bytes and 85 objects/slab.) Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add a might_sleep_if() to dma_pool_alloc()Dima Zavin2010-10-261-0/+2
| | | | | | | | | | | | | Buggy drivers (e.g. fsl_udc) could call dma_pool_alloc from atomic context with GFP_KERNEL. In most instances, the first pool_alloc_page call would succeed and the sleeping functions would never be called. This allowed the buggy drivers to slip through the cracks. Add a might_sleep_if() checking for __GFP_WAIT in flags. Signed-off-by: Dima Zavin <dima@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: highmem documentationPeter Zijlstra2010-10-261-0/+162
| | | | | | | | | | | | | | | | | | | | | | Document outlining some of the highmem issues, started by me, edited by David. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* perf, x86: Fix up kmap_atomic() typePeter Zijlstra2010-10-261-3/+2
| | | | | | | | | | | | | | | | | | | | | Now that the KM_type stuff is history, clean up the compiler warning. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove pte_*map_nested()Peter Zijlstra2010-10-2631-99/+22
| | | | | | | | | | | | | | | | | | | | | | Since we no longer need to provide KM_type, the whole pte_*map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: stack based kmap_atomic()Peter Zijlstra2010-10-2628-372/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: strictly nested kmap_atomic()Peter Zijlstra2010-10-265-8/+8
| | | | | | | | | | | | | | | | | | | | | Ensure kmap_atomic() usage is strictly nested Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmscan,tmpfs: treat used once pages on tmpfs as used onceKOSAKI Motohiro2010-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When a page has PG_referenced, shrink_page_list() discards it only if it is not dirty. This rule works fine if the backing filesystem is a regular one. PG_dirty is a good signal that the page was used recently because the flusher threads clean pages periodically. In addition, page writeback is costlier than simple page discard. However, when a page is on tmpfs this heuristic doesn't work because flusher threads don't write back tmpfs pages. Consequently tmpfs pages always rotate around the lru twice at least and adds unnecessary lru churn. Simple tmpfs streaming io shouldn't cause large anonymous page swap-out. Remove this unncessary reclaim bonus of tmpfs pages. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: remove the internal 5% low bound on dirty_ratioWu Fengguang2010-10-262-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | The dirty_ratio was silently limited in global_dirty_limits() to >= 5%. This is not a user expected behavior. And it's inconsistent with calc_period_shift(), which uses the plain vm_dirty_ratio value. Let's remove the internal bound. At the same time, fix balance_dirty_pages() to work with the dirty_thresh=0 case. This allows applications to proceed when dirty+writeback pages are all cleaned. And ">" fits with the name "exceeded" better than ">=" does. Neil thinks it is an aesthetic improvement as well as a functional one :) Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Proposed-by: Con Kolivas <kernel@kolivas.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: do not sleep on the congestion queue if there are no congested ↵Mel Gorman2010-10-266-12/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | BDIs or if significant congestion is not being encountered in the current zone If congestion_wait() is called with no BDI congested, the caller will sleep for the full timeout and this may be an unnecessary sleep. This patch adds a wait_iff_congested() that checks congestion and only sleeps if a BDI is congested else, it calls cond_resched() to ensure the caller is not hogging the CPU longer than its quota but otherwise will not sleep. This is aimed at reducing some of the major desktop stalls reported during IO. For example, while kswapd is operating, it calls congestion_wait() but it could just have been reclaiming clean page cache pages with no congestion. Without this patch, it would sleep for a full timeout but after this patch, it'll just call schedule() if it has been on the CPU too long. Similar logic applies to direct reclaimers that are not making enough progress. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmscan: isolate_lru_pages(): stop neighbour search if neighbour cannot be ↵KOSAKI Motohiro2010-10-261-6/+11
| | | | | | | | | | | | | | | | | | | | | isolated isolate_lru_pages() does not just isolate LRU tail pages, but also isolates neighbour pages of the eviction page. The neighbour search does not stop even if neighbours cannot be isolated which is excessive as the lumpy reclaim will no longer result in a successful higher order allocation. This patch stops the PFN neighbour pages if an isolation fails and moves on to the next block. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud