summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fix: "smp_call_function: get rid of the unused nonatomic/retry argument"Ingo Molnar2008-06-271-1/+1
| | | | | | | | | fix: kernel/smp.c: In function 'smp_call_function_mask': kernel/smp.c:303: error: too many arguments to function 'smp_call_function_single' Signed-off-by: Ingo Molnar <mingo@elte.hu>
* fix "smp_call_function: get rid of the unused nonatomic/retry argument"Ingo Molnar2008-06-271-1/+1
| | | | | | | | | fix: arch/x86/kernel/process.c: In function 'cpu_idle_wait': arch/x86/kernel/process.c:64: error: too many arguments to function 'smp_call_function' Signed-off-by: Ingo Molnar <mingo@elte.hu>
* on_each_cpu(): kill unused 'retry' parameterJens Axboe2008-06-2648-84/+84
| | | | | | | | | It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* smp_call_function: get rid of the unused nonatomic/retry argumentJens Axboe2008-06-2649-108/+95
| | | | | | | | It's never used and the comments refer to nonatomic and retry interchangably. So get rid of it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* sh: convert to generic helpers for IPI function callsJens Axboe2008-06-263-50/+13
| | | | | | | | | This converts sh to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Not tested, but it compiles. Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* parisc: convert to generic helpers for IPI function callsJens Axboe2008-06-263-113/+25
| | | | | | | | | | | This converts parisc to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Tested by Kyle, seems to work. Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* mips: convert to generic helpers for IPI function callsJens Axboe2008-06-264-141/+15
| | | | | | | | | | | | | This converts mips to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Not tested, but it compiles. mips shares the same IPI for smp_call_function() and smp_call_function_single(), since not all mips platforms have enough available IPIs to support seperate setups. Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* m32r: convert to generic helpers for IPI function callsJens Axboe2008-06-265-119/+20
| | | | | | | | | This converts m32r to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Not tested, not even compiled. Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* arm: convert to generic helpers for IPI function callsJens Axboe2008-06-263-142/+19
| | | | | | | | | | This converts arm to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Fixups and testing done by Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* alpha: convert to generic helpers for IPI function callsJens Axboe2008-06-264-164/+16
| | | | | | | This converts alpha to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* ia64: convert to generic helpers for IPI function callsJens Axboe2008-06-264-244/+19
| | | | | | | | This converts ia64 to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* powerpc: convert to generic helpers for IPI function callsJens Axboe2008-06-267-226/+33
| | | | | | | | | | | This converts ppc to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). ppc loses the timeout functionality of smp_call_function_mask() with this change, as the generic code does not provide that. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* x86: convert to generic helpers for IPI function callsJens Axboe2008-06-2620-380/+125
| | | | | | | | | | This converts x86, x86-64, and xen to use the new helpers for smp_call_function() and friends, and adds support for smp_call_function_single(). Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Add generic helpers for arch IPI function callsJens Axboe2008-06-266-7/+428
| | | | | | | | | | | | | | | | | This adds kernel/smp.c which contains helpers for IPI function calls. In addition to supporting the existing smp_call_function() in a more efficient manner, it also adds a more scalable variant called smp_call_function_single() for calling a given function on a single CPU only. The core of this is based on the x86-64 patch from Nick Piggin, lots of changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has contributed lots of fixes and suggestions as well. Also thanks to Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage and getting rid of the data allocation fallback deadlock. Acked-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Linux 2.6.26-rc8v2.6.26-rc8Linus Torvalds2008-06-241-1/+1
|
* Merge branch 'release' of ↵Linus Torvalds2008-06-243-5/+2
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() [IA64] Handle count==0 in sn2_ptc_proc_write() [IA64] Fix boot failure on ia64/sn2
| * [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()Julia Lawall2008-06-241-2/+0
| | | | | | | | | | | | | | | | | | As noted by Akinobu Mita alloc_bootmem and related functions never return NULL and always return a zeroed region of memory. Thus a NULL test or memset after calls to these functions is unnecessary. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * [IA64] Handle count==0 in sn2_ptc_proc_write()Cliff Wickman2008-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | The fix applied in e0c6d97c65e0784aade7e97b9411f245a6c543e7 "security hole in sn2_ptc_proc_write" didn't take into account the case where count==0 (which results in a buffer underrun when adding the trailing '\0'). Thanks to Andi Kleen for pointing this out. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * [IA64] Fix boot failure on ia64/sn2Jes Sorensen2008-06-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Call check_sal_cache_flush() after platform_setup() as check_sal_cache_flush() now relies on being able to call platform vector code. Problem was introduced by: 3463a93def55c309f3c0d0a8aaf216be3be42d64 "Update check_sal_cache_flush to use platform_send_ipi()" Signed-off-by: Jes Sorensen <jes@sgi.com> Tested-by: Alex Chiang: <achiang@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2008-06-242-15/+10
|\ \ | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: [GFS2] fix gfs2 block allocation (cleaned up) [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000
| * | [GFS2] fix gfs2 block allocation (cleaned up)Benjamin Marzinski2008-06-241-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes bz 450641. This patch changes the computation for zero_metapath_length(), which it renames to metapath_branch_start(). When you are extending the metadata tree, The indirect blocks that point to the new data block must either diverge from the existing tree either at the inode, or at the first indirect block. They can diverge at the first indirect block because the inode has room for 483 pointers while the indirect blocks have room for 509 pointers, so when the tree is grown, there is some free space in the first indirect block. What metapath_branch_start() now computes is the height where the first indirect block for the new data block is located. It can either be 1 (if the indirect block diverges from the inode) or 2 (if it diverges from the first indirect block). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000Bob Peterson2008-06-241-1/+1
| |/ | | | | | | | | | | | | | | This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to handle kernel paging request at ffff81002690e000. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | Merge branch 'kvm-updates-2.6.26' of ↵Linus Torvalds2008-06-2418-266/+358
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
| * | KVM: Remove now unused structs from kvm_para.hGerd Hoffmann2008-06-241-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_* structs are obsoleted by the pvclock_* ones. Now all users have been switched over and the old structs can be dropped. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | x86: KVM guest: Use the paravirt clocksource structs and functionsGerd Hoffmann2008-06-242-56/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann2008-06-242-14/+65
| | | | | | | | | | | | | | | | | | | | | | | | This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | x86: Make xen use the paravirt clocksource structs and functionsGerd Hoffmann2008-06-243-124/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | x86: Add structs and functions for paravirt clocksourceGerd Hoffmann2008-06-245-0/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity2008-06-241-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: ioapic: fix lost interrupt when changing a device's irqAvi Kivity2008-06-241-20/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ioapic acknowledge path translates interrupt vectors to irqs. It currently uses a first match algorithm, stopping when it finds the first redirection table entry containing the vector. That fails however if the guest changes the irq to a different line, leaving the old redirection table entry in place (though masked). Result is interrupts not making it to the guest. Fix by always scanning the entire redirection table. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity2008-06-241-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti2008-06-241-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti2008-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: close timer injection race window in __vcpu_runMarcelo Tosatti2008-06-244-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti2008-06-241-12/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdogLinus Torvalds2008-06-241-1/+0
|\ \ | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog: Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"
| * | Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"Wim Van Sebroeck2008-06-241-1/+0
| |/ | | | | | | | | | | | | | | After Linus fixed the inline assembly, the CFLAGS option is not needed anymore. Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-06-246-77/+27
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: remove support for non-PAE 32-bit
| * | xen: remove support for non-PAE 32-bitJeremy Fitzhardinge2008-06-246-77/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'for_linus' of ↵Linus Torvalds2008-06-242-15/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: sparse fix kgdb: documentation update - remove kgdboe
| * | | kgdb: sparse fixJason Wessel2008-06-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fix warning reported by sparse kernel/kgdb.c:1502:6: warning: symbol 'kgdb_console_write' was not declared. Should it be static? Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | kgdb: documentation update - remove kgdboeJason Wessel2008-06-241-14/+6
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | kgdboe is not presently included kgdb, and there should be no references to it. Also fix the tcp port terminal connection example. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* | | enable bus mastering on i915 at resume timeJie Luo2008-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 9xx chips, bus mastering needs to be enabled at resume time for much of the chip to function. With this patch, vblank interrupts will work as expected on resume, along with other chip functions. Fixes kernel bugzilla #10844. Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | alpha: fix compile error in arch/alpha/mm/init.cThorsten Kranzkowski2008-06-231-0/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9267b4b3880d00dc2dab90f1d817c856939114f7 ("alpha: fix module load failures on smp (bug #10926)") causes a regression for my ev4 uniprocessor build: CC arch/alpha/mm/init.o /export/data/repositories/linux-2.6/arch/alpha/mm/init.c:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘typeof’ make[2]: *** [arch/alpha/mm/init.o] Error 1 make[1]: *** [arch/alpha/mm] Error 2 make: *** [sub-make] Error 2 This fixes it for me (compile and boot tested): Signed-off-by: Thorsten Kranzkowski <dl8bcu@dl8bcu.de> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2008-06-233-37/+51
|\ \ | | | | | | | | | | | | | | | | | | * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: nfs_updatepage(): don't mark page as dirty if an error occurred NFS: Fix filehandle size comparisons in the mount code NFS: Reduce the NFS mount code stack usage.
| * | NFS: nfs_updatepage(): don't mark page as dirty if an error occurredTrond Myklebust2008-06-231-3/+4
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Fix filehandle size comparisons in the mount codeTrond Myklebust2008-06-232-6/+7
| | | | | | | | | | | | | | | | | | | | | Fix a sign issue in xdr_decode_fhstatus3() Fix incorrect comparison in nfs_validate_mount_data() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Reduce the NFS mount code stack usage.Trond Myklebust2008-06-231-28/+40
| | | | | | | | | | | | | | | | | | | | | This appears to fix the Oops reported in http://bugzilla.kernel.org/show_bug.cgi?id=10826 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2008-06-231-20/+73
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futexes: fix fault handling in futex_lock_pi
| * | | futexes: fix fault handling in futex_lock_piThomas Gleixner2008-06-231-20/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses a very sporadic pi-futex related failure in highly threaded java apps on large SMP systems. David Holmes reported that the pi_state consistency check in lookup_pi_state triggered with his test application. This means that the kernel internal pi_state and the user space futex variable are out of sync. First we assumed that this is a user space data corruption, but deeper investigation revieled that the problem happend because the pi-futex code is not handling a fault in the futex_lock_pi path when the user space variable needs to be fixed up. The fault happens when a fork mapped the anon memory which contains the futex readonly for COW or the page got swapped out exactly between the unlock of the futex and the return of either the new futex owner or the task which was the expected owner but failed to acquire the kernel internal rtmutex. The current futex_lock_pi() code drops out with an inconsistent in case it faults and returns -EFAULT to user space. User space has no way to fixup that state. When we wrote this code we thought that we could not drop the hash bucket lock at this point to handle the fault. After analysing the code again it turned out to be wrong because there are only two tasks involved which might modify the pi_state and the user space variable: - the task which acquired the rtmutex - the pending owner of the pi_state which did not get the rtmutex Both tasks drop into the fixup_pi_state() function before returning to user space. The first task which acquired the hash bucket lock faults in the fixup of the user space variable, drops the spinlock and calls futex_handle_fault() to fault in the page. Now the second task could acquire the hash bucket lock and tries to fixup the user space variable as well. It either faults as well or it succeeds because the first task already faulted the page in. One caveat is to avoid a double fixup. After returning from the fault handling we reacquire the hash bucket lock and check whether the pi_state owner has been modified already. Reported-by: David Holmes <david.holmes@sun.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Holmes <david.holmes@sun.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 73 insertions(+), 20 deletions(-)
OpenPOWER on IntegriCloud