summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mm: vmap area cacheNick Piggin2011-03-221-52/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a free area cache for the vmalloc virtual address allocator, based on the algorithm used by the user virtual memory allocator. This reduces the number of rbtree operations and linear traversals over the vmap extents in order to find a free area, by starting off at the last point that a free area was found. The free area cache is reset if areas are freed behind it, or if we are searching for a smaller area or alignment than last time. So allocation patterns are not changed (verified by corner-case and random test cases in userspace testing). This solves a regression caused by lazy vunmap TLB purging introduced in db64fe02 (mm: rewrite vmap layer). That patch will leave extents in the vmap allocator after they are vunmapped, and until a significant number accumulate that can be flushed in a single batch. So in a workload that vmalloc/vfree frequently, a chain of extents will build up from VMALLOC_START address, which have to be iterated over each time (giving an O(n) type of behaviour). After this patch, the search will start from where it left off, giving closer to an amortized O(1). This is verified to solve regressions reported Steven in GFS2, and Avi in KVM. Hugh's update: : I tried out the recent mmotm, and on one machine was fortunate to hit : the BUG_ON(first->va_start < addr) which seems to have been stalling : your vmap area cache patch ever since May. : I can get you addresses etc, I did dump a few out; but once I stared : at them, it was easier just to look at the code: and I cannot see how : you would be so sure that first->va_start < addr, once you've done : that addr = ALIGN(max(...), align) above, if align is over 0x1000 : (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve). : I originally got around it by just changing the : if (first->va_start < addr) { : to : while (first->va_start < addr) { : without thinking about it any further; but that seemed unsatisfactory, : why would we want to loop here when we've got another very similar : loop just below it? : I am never going to admit how long I've spent trying to grasp your : "while (n)" rbtree loop just above this, the one with the peculiar : if (!first && tmp->va_start < addr + size) : in. That's unfamiliar to me, I'm guessing it's designed to save a : subsequent rb_next() in a few circumstances (at risk of then setting : a wrong cached_hole_size?); but they did appear few to me, and I didn't : feel I could sign off something with that in when I don't grasp it, : and it seems responsible for extra code and mistaken BUG_ON below it. : I've reverted to the familiar rbtree loop that find_vma() does (but : with va_end >= addr as you had, to respect the additional guard page): : and then (given that cached_hole_size starts out 0) I don't see the : need for any complications below it. If you do want to keep that loop : as you had it, please add a comment to explain what it's trying to do, : and where addr is relative to first when you emerge from it. : Aren't your tests "size <= cached_hole_size" and : "addr + size > first->va_start" forgetting the guard page we want : before the next area? I've changed those. : I have not changed your many "addr + size - 1 < addr" overflow tests, : but have since come to wonder, shouldn't they be "addr + size < addr" : tests - won't the vend checks go wrong if addr + size is 0? : I have added a few comments - Wolfgang Wander's 2.6.13 description of : 1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation : helped me a lot, perhaps a pointer to that would be good too. And I found : it easier to understand when I renamed cached_start slightly and moved the : overflow label down. : This patch would go after your mm-vmap-area-cache.patch in mmotm. : Trivially, nobody is going to get that BUG_ON with this patch, and it : appears to work fine on my machines; but I have not given it anything like : the testing you did on your original, and may have broken all the : performance you were aiming for. Please take a look and test it out : integrate with yours if you're satisfied - thanks. [akpm@linux-foundation.org: add locking comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reported-and-tested-by: Steven Whitehouse <swhiteho@redhat.com> Reported-and-tested-by: Avi Kivity <avi@redhat.com> Tested-by: "Barry J. Marson" <bmarson@redhat.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pwm_backlight: add check_fb() hookRobert Morell2011-03-222-0/+14
| | | | | | | | | | | | | | | | | | In systems with multiple framebuffer devices, one of the devices might be blanked while another is unblanked. In order for the backlight blanking logic to know whether to turn off the backlight for a particular framebuffer's blanking notification, it needs to be able to check if a given framebuffer device corresponds to the backlight. This plumbs the check_fb hook from core backlight through the pwm_backlight helper to allow platform code to plug in a check_fb hook. Signed-off-by: Robert Morell <rmorell@nvidia.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Arun Murthy <arun.murthy@stericsson.com> Cc: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/video/backlight/jornada720_*.c: make needlessly global symbols staticAxel Lin2011-03-222-4/+4
| | | | | | | | | | | | | The following symbols are needlessly defined global: jornada_bl_init, jornada_bl_exit, jornada_lcd_init, jornada_lcd_exit. Make them static. Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Kristoffer Ericson <kristoffer.ericson@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* backlight: apple_bl depends on ACPIRandy Dunlap2011-03-221-2/+2
| | | | | | | | | | | | | | | | | apple_bl uses ACPI interfaces (data & code), so it should depend on ACPI. drivers/video/backlight/apple_bl.c:142: warning: 'struct acpi_device' declared inside parameter list drivers/video/backlight/apple_bl.c:142: warning: its scope is only this definition or declaration, which is probably not what you want drivers/video/backlight/apple_bl.c:201: warning: 'struct acpi_device' declared inside parameter list drivers/video/backlight/apple_bl.c:215: error: variable 'apple_bl_driver' has initializer but incomplete type drivers/video/backlight/apple_bl.c:216: error: unknown field 'name' specified in initializer ... Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mbp_nvidia_bl: rename to apple_blMatthew Garrett2011-03-223-34/+35
| | | | | | | | | | | | | It works on hardware other than Macbook Pros, and it works on GPUs other than Nvidia. It should even work on iMacs, so change the name to match reality more precisely and include an alias so existing users don't get confused. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org> Cc: Mourad De Clerck <mourad@aquazul.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mbp_nvidia_bl: check that the backlight control functionsMatthew Garrett2011-03-221-0/+13
| | | | | | | | | | | | The SMI-based backlight control functionality may fail to work if the system is running under EFI rather than BIOS. Check that the hardware responds as expected, and exit if it doesn't. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org> Cc: Mourad De Clerck <mourad@aquazul.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mbp_nvidia_bl: remove DMI dependencyMatthew Garrett2011-03-223-275/+101
| | | | | | | | | | | | | | | | This driver only has to deal with two different classes of hardware, but right now it needs new DMI entries for every new machine. It turns out that there's an ACPI device that uniquely identifies Apples with backlights, so this patch reworks the driver into an ACPI one, identifies the hardware by checking the PCI vendor of the root bridge and strips out all the DMI code. It also changes the config text to clarify that it works on devices other than Macbook Pros and GPUs other than nvidia. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org> Cc: Mourad De Clerck <mourad@aquazul.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* acpi: tie ACPI backlight devices to PCI devices if possibleMatthew Garrett2011-03-221-1/+14
| | | | | | | | | | | | | | | | | | | Dual-GPU machines may provide more than one ACPI backlight interface. Tie the backlight device to the GPU in order to allow userspace to identify the correct interface. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Airlie <airlied@linux.ie> Cc: Alex Deucher <alexdeucher@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nouveau: change the backlight parent device to the connector, not the PCI devMatthew Garrett2011-03-224-20/+27
| | | | | | | | | | | | | | | | | | | We may eventually end up with per-connector backlights, especially with ddcci devices. Make sure that the parent node for the backlight device is the connector rather than the PCI device. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Airlie <airlied@linux.ie> Cc: Alex Deucher <alexdeucher@gmail.com> Acked-by: Ben Skeggs <bskeggs@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* radeon: expose backlight class device for legacy LVDS encoderMichel Dänzer2011-03-224-6/+273
| | | | | | | | | | | | | | | | | | | | Allows e.g. power management daemons to control the backlight level. Inspired by the corresponding code in radeonfb. [mjg@redhat.com: updated to add backlight type and make the connector the parent device] Signed-off-by: Michel Dänzer <michel@daenzer.net> Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Airlie <airlied@linux.ie> Acked-by: Alex Deucher <alexdeucher@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Sedat Dilek <sedat.dilek@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* backlight: add backlight typeMatthew Garrett2011-03-2259-2/+111
| | | | | | | | | | | | | | | | | | | There may be multiple ways of controlling the backlight on a given machine. Allow drivers to expose the type of interface they are providing, making it possible for userspace to make appropriate policy decisions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Airlie <airlied@linux.ie> Cc: Alex Deucher <alexdeucher@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/leds/leds-lp5523.c: world-writable engine* sysfs filesVasiliy Kulikov2011-03-221-10/+10
| | | | | | | | | Don't allow everybody to change LED settings. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/leds/leds-lp5521.c: world-writable sysfs engine* filesVasiliy Kulikov2011-03-221-7/+7
| | | | | | | | | Don't allow everybody to change LED settings. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/vidfeo/backlight: ld9040 amoled driver supportDonghwa Lee2011-03-224-0/+1028
| | | | | | | | | | | Add a ld9040 amoled panel driver. Signed-off-by: Donghwa Lee <dh09.lee@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* leds: make *struct gpio_led_platform_data.leds constUwe Kleine-König2011-03-222-3/+3
| | | | | | | | | | And fix a typo. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* leds: add driver for LM3530 ALSShreshtha Kumar Sahu2011-03-224-0/+496
| | | | | | | | | | | Simple backlight driver for National Semiconductor LM3530. Presently only manual mode is supported, PWM and ALS support to be added. Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com> Cc: Linus Walleij <linus.walleij@stericsson.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* leds: convert bd2802 driver to dev_pm_opsMark Brown2011-03-221-19/+28
| | | | | | | | | | | | | | | There is a move to deprecate bus-specific PM operations and move to using dev_pm_ops instead in order to reduce the amount of boilerplate code in buses and facilitiate updates to the PM core. Do this move for the bs2802 driver. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Kim Kyuwon <q1.kim@samsung.com> Cc: Kim Kyuwon <chammoru@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroups: if you list_empty() a head then don't list_del() itPhil Carmody2011-03-221-8/+6
| | | | | | | | | | | | | | | | | | | | | list_del() leaves poison in the prev and next pointers. The next list_empty() will compare those poisons, and say the list isn't empty. Any list operations that assume the node is on a list because of such a check will be fooled into dereferencing poison. One needs to INIT the node after the del, and fortunately there's already a wrapper for that - list_del_init(). Some of the dels are followed by deallocations, so can be ignored, and one can be merged with an add to make a move. Apart from that, I erred on the side of caution in making nodes list_empty()-queriable. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: avoid deferring oom killer if exiting task is being tracedDavid Rientjes2011-03-221-15/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The oom killer naturally defers killing anything if it finds an eligible task that is already exiting and has yet to detach its ->mm. This avoids unnecessarily killing tasks when one is already in the exit path and may free enough memory that the oom killer is no longer needed. This is detected by PF_EXITING since threads that have already detached its ->mm are no longer considered at all. The problem with always deferring when a thread is PF_EXITING, however, is that it may never actually exit when being traced, specifically if another task is tracing it with PTRACE_O_TRACEEXIT. The oom killer does not want to defer in this case since there is no guarantee that thread will ever exit without intervention. This patch will now only defer the oom killer when a thread is PF_EXITING and no ptracer has stopped its progress in the exit path. It also ensures that a child is sacrificed for the chosen parent only if it has a different ->mm as the comment implies: this ensures that the thread group leader is always targeted appropriately. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: skip zombies when iterating tasklistAndrey Vagin2011-03-221-1/+3
| | | | | | | | | | | | | We shouldn't defer oom killing if a thread has already detached its ->mm and still has TIF_MEMDIE set. Memory needs to be freed, so find kill other threads that pin the same ->mm or find another task to kill. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: prevent unnecessary oom kills or kernel panicsDavid Rientjes2011-03-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prevents unnecessary oom kills or kernel panics by reverting two commits: 495789a5 (oom: make oom_score to per-process value) cef1d352 (oom: multi threaded process coredump don't make deadlock) First, 495789a5 (oom: make oom_score to per-process value) ignores the fact that all threads in a thread group do not necessarily exit at the same time. It is imperative that select_bad_process() detect threads that are in the exit path, specifically those with PF_EXITING set, to prevent needlessly killing additional tasks. If a process is oom killed and the thread group leader exits, select_bad_process() cannot detect the other threads that are PF_EXITING by iterating over only processes. Thus, it currently chooses another task unnecessarily for oom kill or panics the machine when nothing else is eligible. By iterating over threads instead, it is possible to detect threads that are exiting and nominate them for oom kill so they get access to memory reserves. Second, cef1d352 (oom: multi threaded process coredump don't make deadlock) erroneously avoids making the oom killer a no-op when an eligible thread other than current isfound to be exiting. We want to detect this situation so that we may allow that exiting thread time to exit and free its memory; if it is able to exit on its own, that should free memory so current is no loner oom. If it is not able to exit on its own, the oom killer will nominate it for oom kill which, in this case, only means it will get access to memory reserves. Without this change, it is easy for the oom killer to unnecessarily target tasks when all threads of a victim don't exit before the thread group leader or, in the worst case, panic the machine. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: swap: unlock swapfile inode mutex before closing file on bad swapfilesMel Gorman2011-03-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an administrator tries to swapon a file backed by NFS, the inode mutex is taken (as it is for any swapfile) but later identified to be a bad swapfile due to the lack of bmap and tries to cleanup. During cleanup, an attempt is made to close the file but with inode->i_mutex still held. Closing an NFS file syncs it which tries to acquire the inode mutex leading to deadlock. If lockdep is enabled the following appears on the console; ============================================= [ INFO: possible recursive locking detected ] 2.6.38-rc8-autobuild #1 --------------------------------------------- swapon/2192 is trying to acquire lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c but task is already holding lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 other info that might help us debug this: 1 lock held by swapon/2192: #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 stack backtrace: Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1 Call Trace: __lock_acquire+0x2eb/0x1623 find_get_pages_tag+0x14a/0x174 pagevec_lookup_tag+0x25/0x2e vfs_fsync_range+0x47/0x7c lock_acquire+0xd3/0x100 vfs_fsync_range+0x47/0x7c nfs_flush_one+0x0/0xdf [nfs] mutex_lock_nested+0x40/0x2b1 vfs_fsync_range+0x47/0x7c vfs_fsync_range+0x47/0x7c vfs_fsync+0x1c/0x1e nfs_file_flush+0x64/0x69 [nfs] filp_close+0x43/0x72 sys_swapon+0xa39/0xae7 sysret_check+0x2e/0x69 system_call_fastpath+0x16/0x1b This patch releases the mutex if its held before calling filep_close() so swapon fails as expected without deadlock when the swapfile is backed by NFS. If accepted for 2.6.39, it should also be considered a -stable candidate for 2.6.38 and 2.6.37. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/asm-generic/unistd.h: fix syncfs syscall numberAndrew Morton2011-03-221-1/+1
| | | | | | | | | syncfs() is duplicating name_to_handle_at() due to a merging mistake. Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'slab/urgent' of ↵Linus Torvalds2011-03-222-1/+6
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Add statistics for this_cmpxchg_double failures slub: Add missing irq restore for the OOM path
| * slub: Add statistics for this_cmpxchg_double failuresChristoph Lameter2011-03-222-1/+3
| | | | | | | | | | | | | | Add some statistics for debugging. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: Add missing irq restore for the OOM pathChristoph Lameter2011-03-221-0/+3
| | | | | | | | | | | | | | OOM path is missing the irq restore in the CONFIG_CMPXCHG_LOCAL case. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-03-2214-74/+129
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: [net/9p]: Introduce basic flow-control for VirtIO transport. 9p: use the updated offset given by generic_write_checks [net/9p] Don't re-pin pages on retrying virtqueue_add_buf(). [net/9p] Set the condition just before waking up. [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring fs/9p: Add v9fs_dentry2v9ses fs/9p: Attach writeback_fid on first open with WR flag fs/9p: Open writeback fid in O_SYNC mode fs/9p: Use truncate_setsize instead of vmtruncate net/9p: Fix compile warning net/9p: Convert the in the 9p rpc call path to GFP_NOFS fs/9p: Fix race in initializing writeback fid
| * | [net/9p]: Introduce basic flow-control for VirtIO transport.Venkateswararao Jujjuri (JV)2011-03-221-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent zerocopy work in the 9P VirtIO transport maps and pins user buffers into kernel memory for the server to work on them. Since the user process can initiate this kind of pinning with a simple read/write call, thousands of IO threads initiated by the user process can hog the system resources and could result into denial of service. This patch introduces flow control to avoid that extreme scenario. The ceiling limit to avoid denial of service attacks is set to relatively high (nr_free_pagecache_pages()/4) so that it won't interfere with regular usage, but can step in extreme cases to limit the total system hang. Since we don't have a global structure to accommodate this variable, I choose the virtio_chan as the home for this. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | 9p: use the updated offset given by generic_write_checksM. Mohan Kumar2011-03-221-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Without this fix, even if a file is opened in O_APPEND mode, data will be written at current file position instead of end of file. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | [net/9p] Don't re-pin pages on retrying virtqueue_add_buf().Venkateswararao Jujjuri (JV)2011-03-221-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | [net/9p] Set the condition just before waking up.Venkateswararao Jujjuri (JV)2011-03-221-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that the sprious wake-ups are common, we need to move the condition setting right next to the wake_up(). After setting the condition to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the virtqueue back on the free list for someone else to use. This may result in kernel panic while relasing the pinned pages in p9_release_req_pages(). Also rearranged the while loop in req_done() for better redability. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | [net/9p] unconditional wake_up to proc waiting for space on VirtIO ringVenkateswararao Jujjuri (JV)2011-03-221-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Process may wait to get space on VirtIO ring to send a transaction to VirtFS server. Current code just does a conditional wake_up() which means only one process will be woken up even if multiple processes are waiting. This fix makes the wake_up unconditional. Hence we won't have any processes waiting for-ever. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | fs/9p: Add v9fs_dentry2v9sesAneesh Kumar K.V2011-03-226-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | Add the new static inline and use the same Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | fs/9p: Attach writeback_fid on first open with WR flagAneesh Kumar K.V2011-03-223-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | We don't need writeback fid if we are only doing O_RDONLY open Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | fs/9p: Open writeback fid in O_SYNC modeAneesh Kumar K.V2011-03-221-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Older version of protocol don't support tsyncfs operation. So for them force a O_SYNC flag on the server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | fs/9p: Use truncate_setsize instead of vmtruncateAneesh Kumar K.V2011-03-222-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | convert vmtruncate usage to truncate_setsize. We also writeback all dirty pages before doing 9p operations and on success call truncate_setsize. This ensure that we continue sanely on failed truncate on the server. The disadvantage is that we are now going to write back the content that get thrown away later as a part of truncate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | net/9p: Fix compile warningAneesh Kumar K.V2011-03-221-5/+5
| | | | | | | | | | | | | | | | | | Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | net/9p: Convert the in the 9p rpc call path to GFP_NOFSAneesh Kumar K.V2011-03-225-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this we can cause reclaim allocation in writepage. [ 3433.448430] ================================= [ 3433.449117] [ INFO: inconsistent lock state ] [ 3433.449117] 2.6.38-rc5+ #84 [ 3433.449117] --------------------------------- [ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. [ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3433.449117] (iprune_sem){+++++-}, at: [<ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1 [ 3433.449117] {RECLAIM_FS-ON-W} state was registered at: [ 3433.449117] [<ffffffff8107fe5f>] mark_held_locks+0x52/0x70 [ 3433.449117] [<ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f [ 3433.449117] [<ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c [ 3433.449117] [<ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2 [ 3433.449117] [<ffffffff8127be77>] idr_pre_get+0x2d/0x6f [ 3433.449117] [<ffffffff815434eb>] p9_idpool_get+0x30/0xae [ 3433.449117] [<ffffffff81540123>] p9_client_rpc+0xd7/0x9b0 [ 3433.449117] [<ffffffff815427b0>] p9_client_clunk+0x88/0xdb [ 3433.449117] [<ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48 [ 3433.449117] [<ffffffff810eb511>] evict+0x1f/0x87 [ 3433.449117] [<ffffffff810eb5c0>] dispose_list+0x47/0xe3 [ 3433.449117] [<ffffffff810eb8da>] evict_inodes+0x138/0x14f [ 3433.449117] [<ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8 [ 3433.449117] [<ffffffff810d91e8>] kill_anon_super+0x11/0x50 [ 3433.449117] [<ffffffff811d4951>] v9fs_kill_super+0x49/0xab [ 3433.449117] [<ffffffff810d926e>] deactivate_locked_super+0x21/0x46 [ 3433.449117] [<ffffffff810d9e84>] deactivate_super+0x40/0x44 [ 3433.449117] [<ffffffff810ef848>] mntput_no_expire+0x100/0x109 [ 3433.449117] [<ffffffff810f0aeb>] sys_umount+0x2f1/0x31c [ 3433.449117] [<ffffffff8102c87b>] system_call_fastpath+0x16/0x1b [ 3433.449117] irq event stamp: 192941 [ 3433.449117] hardirqs last enabled at (192941): [<ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30 [ 3433.449117] hardirqs last disabled at (192940): [<ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5 [ 3433.449117] softirqs last enabled at (188470): [<ffffffff8105fd65>] __do_softirq+0x133/0x152 [ 3433.449117] softirqs last disabled at (188455): [<ffffffff8102d7cc>] call_softirq+0x1c/0x28 [ 3433.449117] [ 3433.449117] other info that might help us debug this: [ 3433.449117] 1 lock held by kswapd0/505: [ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810b52e2>] shrink_slab+0x38/0x15f [ 3433.449117] [ 3433.449117] stack backtrace: [ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84 [ 3433.449117] Call Trace: [ 3433.449117] [<ffffffff8107fbce>] ? valid_state+0x17e/0x191 [ 3433.449117] [<ffffffff81036896>] ? save_stack_trace+0x28/0x45 [ 3433.449117] [<ffffffff81080426>] ? check_usage_forwards+0x0/0x87 [ 3433.449117] [<ffffffff8107fcf4>] ? mark_lock+0x113/0x22c [ 3433.449117] [<ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7 [ 3433.449117] [<ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c [ 3433.449117] [<ffffffff81081077>] ? __lock_acquire+0x392/0xcf7 [ 3433.449117] [<ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28 [ 3433.449117] [<ffffffff81081a33>] ? lock_acquire+0x57/0x6d [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff81567d85>] ? down_read+0x47/0x5c [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810b5385>] ? shrink_slab+0xdb/0x15f [ 3433.449117] [<ffffffff810b69bc>] ? kswapd+0x574/0x96a [ 3433.449117] [<ffffffff810b6448>] ? kswapd+0x0/0x96a [ 3433.449117] [<ffffffff810714e2>] ? kthread+0x7d/0x85 [ 3433.449117] [<ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10 [ 3433.449117] [<ffffffff81569200>] ? restore_args+0x0/0x30 [ 3433.449117] [<ffffffff81071465>] ? kthread+0x0/0x85 [ 3433.449117] [<ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | fs/9p: Fix race in initializing writeback fidAneesh Kumar K.V2011-03-224-0/+11
| |/ | | | | | | | | | | | | | | | | | | When two process open the same file we can end up with both of them allocating the writeback_fid. Add a new mutex which can be used for synchronizing v9fs_inode member values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds2011-03-2215-230/+1018
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use watch/notify for changes in rbd header libceph: add lingering request and watch/notify event framework rbd: update email address in Documentation ceph: rename dentry_release -> d_release, fix comment ceph: add request to the tail of unsafe write list ceph: remove request from unsafe list if it is canceled/timed out ceph: move readahead default to fs/ceph from libceph ceph: add ino32 mount option ceph: update common header files ceph: remove debugfs debug cruft libceph: fix osd request queuing on osdmap updates ceph: preserve I_COMPLETE across rename libceph: Fix base64-decoding when input ends in newline.
| * | rbd: use watch/notify for changes in rbd headerYehuda Sadeh2011-03-221-26/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | Send notifications when we change the rbd header (e.g. create a snapshot) and wait for such notifications. This allows synchronizing the snapshot creation between different rbd clients/rools. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * | libceph: add lingering request and watch/notify event frameworkYehuda Sadeh2011-03-223-12/+426
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * | rbd: update email address in DocumentationSage Weil2011-03-211-1/+1
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: rename dentry_release -> d_release, fix commentSage Weil2011-03-211-7/+6
| | | | | | | | | | | | | | | | | | Just for consistency's sake. Fix obsolete comment too. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: add request to the tail of unsafe write listHenry C Chang2011-03-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | In sync_write_wait(), we assume that the newest request is at the tail of unsafe write list. We should maintain the semantics here. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: remove request from unsafe list if it is canceled/timed outHenry C Chang2011-03-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the list corruption warning like this: ------------[ cut here ]------------ WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81() Hardware name: X8DTU list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130). Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan] Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1 Call Trace: [<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94 [<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43 [<ffffffff812351a3>] __list_add+0x68/0x81 [<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph] [<ffffffff8111d2a0>] do_sync_write+0xe8/0x125 [<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3 [<ffffffff811e8521>] ? security_file_permission+0x16/0x18 [<ffffffff8111d864>] vfs_write+0xae/0x10b [<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76 [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b ---[ end trace 08573eb9f07ff6f4 ]--- Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: move readahead default to fs/ceph from libcephSage Weil2011-03-213-3/+3
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: add ino32 mount optionYehuda Sadeh2011-03-214-25/+65
| | | | | | | | | | | | | | | | | | | | | The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * | ceph: update common header filesYehuda Sadeh2011-03-212-13/+45
| | | | | | | | | | | | | | | | | | | | | | | | This updates the common header files used by the different ceph related modules. Specifically it adds definitions required by the rbd watch/notify feature. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * | ceph: remove debugfs debug cruftSage Weil2011-03-211-6/+0
| | | | | | | | | | | | | | | | | | Whoops! Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud