op-kernel-dev - Development kernel branch for OpenPOWER systems

diff options

author	Thomas Gleixner <tglx@linutronix.de>	2008-06-23 11:21:58 +0200
committer	Ingo Molnar <mingo@elte.hu>	2008-06-23 13:31:15 +0200
commit	1b7558e457ed0de61023cfc913d2c342c7c3d9f2 (patch)
tree	fcd456b1633bfd3e68ba2f6631831fbc5336108d /arch
parent	481c5346d0981940ee63037eb53e4e37b0735c10 (diff)
download	op-kernel-dev-1b7558e457ed0de61023cfc913d2c342c7c3d9f2.zip op-kernel-dev-1b7558e457ed0de61023cfc913d2c342c7c3d9f2.tar.gz

futexes: fix fault handling in futex_lock_pi

This patch addresses a very sporadic pi-futex related failure in highly threaded java apps on large SMP systems. David Holmes reported that the pi_state consistency check in lookup_pi_state triggered with his test application. This means that the kernel internal pi_state and the user space futex variable are out of sync. First we assumed that this is a user space data corruption, but deeper investigation revieled that the problem happend because the pi-futex code is not handling a fault in the futex_lock_pi path when the user space variable needs to be fixed up. The fault happens when a fork mapped the anon memory which contains the futex readonly for COW or the page got swapped out exactly between the unlock of the futex and the return of either the new futex owner or the task which was the expected owner but failed to acquire the kernel internal rtmutex. The current futex_lock_pi() code drops out with an inconsistent in case it faults and returns -EFAULT to user space. User space has no way to fixup that state. When we wrote this code we thought that we could not drop the hash bucket lock at this point to handle the fault. After analysing the code again it turned out to be wrong because there are only two tasks involved which might modify the pi_state and the user space variable: - the task which acquired the rtmutex - the pending owner of the pi_state which did not get the rtmutex Both tasks drop into the fixup_pi_state() function before returning to user space. The first task which acquired the hash bucket lock faults in the fixup of the user space variable, drops the spinlock and calls futex_handle_fault() to fault in the page. Now the second task could acquire the hash bucket lock and tries to fixup the user space variable as well. It either faults as well or it succeeds because the first task already faulted the page in. One caveat is to avoid a double fixup. After returning from the fault handling we reacquire the hash bucket lock and check whether the pi_state owner has been modified already. Reported-by: David Holmes <david.holmes@sun.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Holmes <david.holmes@sun.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 73 insertions(+), 20 deletions(-)

Diffstat (limited to 'arch')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: