summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2017-05-0129-610/+1231
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and general restructuring - lockdep updates such as new checks for lock_downgrade() - introduce the new atomic_try_cmpxchg() locking API and use it to optimize refcount code generation - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) MAINTAINERS: Add FUTEX SUBSYSTEM futex: Clarify mark_wake_futex memory barrier usage futex: Fix small (and harmless looking) inconsistencies futex: Avoid freeing an active timer rtmutex: Plug preempt count leak in rt_mutex_futex_unlock() rtmutex: Fix more prio comparisons rtmutex: Fix PI chain order integrity sched,tracing: Update trace_sched_pi_setprio() sched/rtmutex: Refactor rt_mutex_setprio() rtmutex: Clean up sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update sched/rtmutex/deadline: Fix a PI crash for deadline tasks rtmutex: Deboost before waking up the top waiter locking/ww-mutex: Limit stress test to 2 seconds locking/atomic: Fix atomic_try_cmpxchg() semantics lockdep: Fix per-cpu static objects futex: Drop hb->lock before enqueueing on the rtmutex futex: Futex_unlock_pi() determinism futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ...
| * MAINTAINERS: Add FUTEX SUBSYSTEMDarren Hart (VMware)2017-04-151-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a MAINTAINERS block for the FUTEX SUBSYSTEM which includes the core kernel code, include headers, testing code, and Documentation. Excludes arch files, and higher level test code. I added tglx and mingo as M as they have made the tip commits and peterz and myself as R. Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/20170414224608.GA5180@fury Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * futex: Clarify mark_wake_futex memory barrier usageDarren Hart (VMware)2017-04-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Clarify the scenario described in mark_wake_futex requiring the smp_store_release(). Update the comment to explicitly refer to the plist_del now under __unqueue_futex() (previously plist_del was in the same function as the comment). Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170414223138.GA4222@fury Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * futex: Fix small (and harmless looking) inconsistenciesPeter Zijlstra2017-04-141-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During (post-commit) review Darren spotted a few minor things. One (harmless AFAICT) type inconsistency and a comment that wasn't as clear as hoped. Reported-by: Darren Hart (VMWare) <dvhart@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * futex: Avoid freeing an active timerThomas Gleixner2017-04-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander reported a hrtimer debug_object splat: ODEBUG: free active (active state 0) object type: hrtimer hint: hrtimer_wakeup (kernel/time/hrtimer.c:1423) debug_object_free (lib/debugobjects.c:603) destroy_hrtimer_on_stack (kernel/time/hrtimer.c:427) futex_lock_pi (kernel/futex.c:2740) do_futex (kernel/futex.c:3399) SyS_futex (kernel/futex.c:3447 kernel/futex.c:3415) do_syscall_64 (arch/x86/entry/common.c:284) entry_SYSCALL64_slow_path (arch/x86/entry/entry_64.S:249) Which was caused by commit: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()") ... losing the hrtimer_cancel() in the shuffle. Where previously the hrtimer_cancel() was done by rt_mutex_slowlock() we now need to do it manually. Reported-by: Alexander Levin <alexander.levin@verizon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: cfafcd117da0 ("futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704101802370.2906@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2017-04-141002-7280/+13754
| |\ | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | rtmutex: Plug preempt count leak in rt_mutex_futex_unlock()Mike Galbraith2017-04-051-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_wakeup_next_waiter() already disables preemption, doing so again leaves us with an unpaired preempt_disable(). Fixes: 2a1c60299406 ("rtmutex: Deboost before waking up the top waiter") Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/1491379707.6538.2.camel@gmx.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | rtmutex: Fix more prio comparisonsPeter Zijlstra2017-04-041-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a pure ->prio comparison left in try_to_wake_rt_mutex(), convert it to use rt_mutex_waiter_less(), noting that greater-or-equal is not-less (both in kernel priority view). This necessitated the introduction of cmp_task() which creates a pointer to an unnamed stack variable of struct rt_mutex_waiter type to compare against tasks. With this, we can now also create and employ rt_mutex_waiter_equal(). Reviewed-and-tested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.455584638@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | rtmutex: Fix PI chain order integrityPeter Zijlstra2017-04-042-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rt_mutex_waiter::prio is a copy of task_struct::prio which is updated during the PI chain walk, such that the PI chain order isn't messed up by (asynchronous) task state updates. Currently rt_mutex_waiter_less() uses task state for deadline tasks; this is broken, since the task state can, as said above, change asynchronously, causing the RB tree order to change without actual tree update -> FAIL. Fix this by also copying the deadline into the rt_mutex_waiter state and updating it along with its prio field. Ideally we would also force PI chain updates whenever DL tasks update their deadline parameter, but for first approximation this is less broken than it was. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.403992539@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | sched,tracing: Update trace_sched_pi_setprio()Peter Zijlstra2017-04-042-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the PI donor task, instead of a numerical priority. Numerical priorities are not sufficient to describe state ever since SCHED_DEADLINE. Annotate all sched tracepoints that are currently broken; fixing them will bork userspace. *hate*. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.353599881@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | sched/rtmutex: Refactor rt_mutex_setprio()Peter Zijlstra2017-04-043-111/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the introduction of SCHED_DEADLINE the whole notion that priority is a single number is gone, therefore the @prio argument to rt_mutex_setprio() doesn't make sense anymore. So rework the code to pass a pi_task instead. Note this also fixes a problem with pi_top_task caching; previously we would not set the pointer (call rt_mutex_update_top_task) if the priority didn't change, this could lead to a stale pointer. As for the XXX, I think its fine to use pi_task->prio, because if it differs from waiter->prio, a PI chain update is immenent. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.303827095@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | rtmutex: Clean upPeter Zijlstra2017-04-043-19/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous patches changed the meaning of the return value of rt_mutex_slowunlock(); update comments and code to reflect this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.255058238@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period updateXunlei Pang2017-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently dl tasks will actually return at the very beginning of rt_mutex_adjust_prio_chain() in !detect_deadlock cases: if (waiter->prio == task->prio) { if (!detect_deadlock) goto out_unlock_pi; // out here else requeue = false; } As the deadline value of blocked deadline tasks(waiters) without changing their sched_class(thus prio doesn't change) never changes, this seems reasonable, but it actually misses the chance of updating rt_mutex_waiter's "dl_runtime(period)_copy" if a waiter updates its deadline parameters(dl_runtime, dl_period) or boosted waiter changes to !deadline class. Thus, force deadline task not out by adding the !dl_prio() condition. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/1460633827-345-7-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/20170323150216.206577901@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | sched/rtmutex/deadline: Fix a PI crash for deadline tasksXunlei Pang2017-04-046-8/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A crash happened while I was playing with deadline PI rtmutex. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff810eeb8f>] rt_mutex_get_top_task+0x1f/0x30 PGD 232a75067 PUD 230947067 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 10994 Comm: a.out Not tainted Call Trace: [<ffffffff810b658c>] enqueue_task+0x2c/0x80 [<ffffffff810ba763>] activate_task+0x23/0x30 [<ffffffff810d0ab5>] pull_dl_task+0x1d5/0x260 [<ffffffff810d0be6>] pre_schedule_dl+0x16/0x20 [<ffffffff8164e783>] __schedule+0xd3/0x900 [<ffffffff8164efd9>] schedule+0x29/0x70 [<ffffffff8165035b>] __rt_mutex_slowlock+0x4b/0xc0 [<ffffffff81650501>] rt_mutex_slowlock+0xd1/0x190 [<ffffffff810eeb33>] rt_mutex_timed_lock+0x53/0x60 [<ffffffff810ecbfc>] futex_lock_pi.isra.18+0x28c/0x390 [<ffffffff810ed8b0>] do_futex+0x190/0x5b0 [<ffffffff810edd50>] SyS_futex+0x80/0x180 This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() are only protected by pi_lock when operating pi waiters, while rt_mutex_get_top_task(), will access them with rq lock held but not holding pi_lock. In order to tackle it, we introduce new "pi_top_task" pointer cached in task_struct, and add new rt_mutex_update_top_task() to update its value, it can be called by rt_mutex_setprio() which held both owner's pi_lock and rq lock. Thus "pi_top_task" can be safely accessed by enqueue_task_dl() under rq lock. Originally-From: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.157682758@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | rtmutex: Deboost before waking up the top waiterXunlei Pang2017-04-043-32/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should deboost before waking the high-priority task, such that we don't run two tasks with the same "state" (priority, deadline, sched_class, etc). In order to make sure the boosting task doesn't start running between unlock and deboost (due to 'spurious' wakeup), we move the deboost under the wait_lock, that way its serialized against the wait loop in __rt_mutex_slowlock(). Doing the deboost early can however lead to priority-inversion if current would get preempted after the deboost but before waking our high-prio task, hence we disable preemption before doing deboost, and enabling it after the wake up is over. This gets us the right semantic order, but most importantly however; this change ensures pointer stability for the next patch, where we have rt_mutex_setprio() cache a pointer to the top-most waiter task. If we, as before this change, do the wakeup first and then deboost, this pointer might point into thin air. [peterz: Changelog + patch munging] Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170323150216.110065320@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | Merge branch 'sched/core' into locking/coreThomas Gleixner2017-04-04226-1830/+3614
| |\ \ | | | | | | | | | | | | | | | | Required for the rtmutex/sched_deadline patches which depend on both branches
| * | | locking/ww-mutex: Limit stress test to 2 secondsChris Wilson2017-03-301-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a timeout rather than a fixed number of loops to avoid running for very long periods, such as under the kbuilder VMs. Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170310105733.6444-1-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/atomic: Fix atomic_try_cmpxchg() semanticsPeter Zijlstra2017-03-302-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry noted that the new atomic_try_cmpxchg() primitive is broken when the old pointer doesn't point to the local stack. He writes: "Consider a classical lock-free stack push: node->next = atomic_read(&head); do { } while (!atomic_try_cmpxchg(&head, &node->next, node)); This code is broken with the current implementation, the problem is with unconditional update of *__po. In case of success it writes the same value back into *__po, but in case of cmpxchg success we might have lose ownership of some memory locations and potentially over what __po has pointed to. The same holds for the re-read of *__po. " He also points out that this makes it surprisingly different from the similar C/C++ atomic operation. After investigating the code-gen differences caused by this patch; and a number of alternatives (Linus dislikes this interface lots), we arrived at these results (size x86_64-defconfig/vmlinux): GCC-6.3.0: 10735757 cmpxchg 10726413 try_cmpxchg 10730509 try_cmpxchg + patch 10730445 try_cmpxchg-linus GCC-7 (20170327): 10709514 cmpxchg 10704266 try_cmpxchg 10704266 try_cmpxchg + patch 10704394 try_cmpxchg-linus From this we see that the patch has the advantage of better code-gen on GCC-7 and keeps the interface roughly consistent with the C language variant. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | lockdep: Fix per-cpu static objectsPeter Zijlstra2017-03-264-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") we try to collapse per-cpu locks into a single class by giving them all the same key. For this key we choose the canonical address of the per-cpu object, which would be the offset into the per-cpu area. This has two problems: - there is a case where we run !0 lock->key through static_obj() and expect this to pass; it doesn't for canonical pointers. - 0 is a valid canonical address. Cure both issues by redefining the canonical address as the address of the per-cpu variable on the boot CPU. Since I didn't want to rely on CPU0 being the boot-cpu, or even existing at all, track the boot CPU in a variable. Fixes: 383776fa7527 ("locking/lockdep: Handle statically initialized PER_CPU locks properly") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Borislav Petkov <bp@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-mm@kvack.org Cc: wfg@linux.intel.com Cc: kernel test robot <fengguang.wu@intel.com> Cc: LKP <lkp@01.org> Link: http://lkml.kernel.org/r/20170320114108.kbvcsuepem45j5cr@hirez.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Drop hb->lock before enqueueing on the rtmutexPeter Zijlstra2017-03-233-30/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI chain code will (falsely) report a deadlock and BUG. The problem is that it hold hb->lock (now an rt_mutex) while doing task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when interleaved just right with futex_unlock_pi() leads it to believe to see an AB-BA deadlock. Task1 (holds rt_mutex, Task2 (does FUTEX_LOCK_PI) does FUTEX_UNLOCK_PI) lock hb->lock lock rt_mutex (as per start_proxy) lock hb->lock Which is a trivial AB-BA. It is not an actual deadlock, because it won't be holding hb->lock by the time it actually blocks on the rt_mutex, but the chainwalk code doesn't know that and it would be a nightmare to handle this gracefully. To avoid this problem, do the same as in futex_unlock_pi() and drop hb->lock after acquiring wait_lock. This still fully serializes against futex_unlock_pi(), since adding to the wait_list does the very same lock dance, and removing it holds both locks. Aside of solving the RT problem this makes the lock and unlock mechanism symetric and reduces the hb->lock held time. Reported-and-tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.161341537@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Futex_unlock_pi() determinismPeter Zijlstra2017-03-231-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with returning -EAGAIN when the waiter state mismatches is that it becomes very hard to proof a bounded execution time on the operation. And seeing that this is a RT operation, this is somewhat important. While in practise; given the previous patch; it will be very unlikely to ever really take more than one or two rounds, proving so becomes rather hard. However, now that modifying wait_list is done while holding both hb->lock and wait_lock, the scenario can be avoided entirely by acquiring wait_lock while still holding hb-lock. Doing a hand-over, without leaving a hole. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.112378812@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock()Peter Zijlstra2017-03-233-42/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By changing futex_lock_pi() to use rt_mutex_*_proxy_lock() all wait_list modifications are done under both hb->lock and wait_lock. This closes the obvious interleave pattern between futex_lock_pi() and futex_unlock_pi(), but not entirely so. See below: Before: futex_lock_pi() futex_unlock_pi() unlock hb->lock lock hb->lock unlock hb->lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock schedule() lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock <idem> -EAGAIN lock hb->lock After: futex_lock_pi() futex_unlock_pi() lock hb->lock lock rt_mutex->wait_lock list_add unlock rt_mutex->wait_lock unlock hb->lock schedule() lock hb->lock unlock hb->lock lock hb->lock lock rt_mutex->wait_lock list_del unlock rt_mutex->wait_lock lock rt_mutex->wait_lock unlock rt_mutex_wait_lock -EAGAIN unlock hb->lock It does however solve the earlier starvation/live-lock scenario which got introduced with the -EAGAIN since unlike the before scenario; where the -EAGAIN happens while futex_unlock_pi() doesn't hold any locks; in the after scenario it happens while futex_unlock_pi() actually holds a lock, and then it is serialized on that lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.062785528@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()Peter Zijlstra2017-03-233-12/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters consistent it's necessary to split 'rt_mutex_futex_lock()' into finer parts, such that only the actual blocking can be done without hb->lock held. Split split_mutex_finish_proxy_lock() into two parts, one that does the blocking and one that does remove_waiter() when the lock acquire failed. When the rtmutex was acquired successfully the waiter can be removed in the acquisiton path safely, since there is no concurrency on the lock owner. This means that, except for futex_lock_pi(), all wait_list modifications are done with both hb->lock and wait_lock held. [bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex,rt_mutex: Introduce rt_mutex_init_waiter()Peter Zijlstra2017-03-233-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since there's already two copies of this code, introduce a helper now before adding a third one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.950039479@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Pull rt_mutex_futex_unlock() out from under hb->lockPeter Zijlstra2017-03-231-54/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a number of 'interesting' problems, all caused by holding hb->lock while doing the rt_mutex_unlock() equivalient. Notably: - a PI inversion on hb->lock; and, - a SCHED_DEADLINE crash because of pointer instability. The previous changes: - changed the locking rules to cover {uval,pi_state} with wait_lock. - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in turn allows to rely on wait_lock atomicity completely. - simplified the waiter conundrum. It's now sufficient to hold rtmutex::wait_lock and a reference on the pi_state to protect the state consistency, so hb->lock can be dropped before calling rt_mutex_futex_unlock(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Rework inconsistent rt_mutex/futex_q statePeter Zijlstra2017-03-231-36/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a weird state in the futex_unlock_pi() path when it interleaves with a concurrent futex_lock_pi() at the point where it drops hb->lock. In this case, it can happen that the rt_mutex wait_list and the futex_q disagree on pending waiters, in particular rt_mutex will find no pending waiters where futex_q thinks there are. In this case the rt_mutex unlock code cannot assign an owner. The futex side fixup code has to cleanup the inconsistencies with quite a bunch of interesting corner cases. Simplify all this by changing wake_futex_pi() to return -EAGAIN when this situation occurs. This then gives the futex_lock_pi() code the opportunity to continue and the retried futex_unlock_pi() will now observe a coherent state. The only problem is that this breaks RT timeliness guarantees. That is, consider the following scenario: T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1) CPU0 T1 lock_pi() queue_me() <- Waiter is visible preemption T2 unlock_pi() loops with -EAGAIN forever Which is undesirable for PI primitives. Future patches will rectify this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Cleanup refcountingPeter Zijlstra2017-03-231-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a put_pit_state() as counterpart for get_pi_state() so the refcounting becomes consistent. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.801778516@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Change locking rulesPeter Zijlstra2017-03-231-33/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently futex-pi relies on hb->lock to serialize everything. But hb->lock creates another set of problems, especially priority inversions on RT where hb->lock becomes a rt_mutex itself. The rt_mutex::wait_lock is the most obvious protection for keeping the futex user space value and the kernel internal pi_state in sync. Rework and document the locking so rt_mutex::wait_lock is held accross all operations which modify the user space value and the pi state. This allows to invoke rt_mutex_unlock() (including deboost) without holding hb->lock as a next step. Nothing yet relies on the new locking rules. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex,rt_mutex: Provide futex specific rt_mutex APIPeter Zijlstra2017-03-233-32/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of what makes futex_unlock_pi() intricate is that rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop rt_mutex::wait_lock. This means it cannot rely on the atomicy of wait_lock, which would be preferred in order to not rely on hb->lock so much. The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can race with the rt_mutex fastpath, however futexes have their own fast path. Since futexes already have a bunch of separate rt_mutex accessors, complete that set and implement a rt_mutex variant without fastpath for them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Remove rt_mutex_deadlock_account_*()Peter Zijlstra2017-03-234-43/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are unused and clutter up the code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.652692478@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Use smp_store_release() in mark_wake_futex()Peter Zijlstra2017-03-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the futex_q can dissapear the instruction after assigning NULL, this really should be a RELEASE barrier. That stops loads from hitting dead memory too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.604296452@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | futex: Cleanup variable names for futex_top_waiter()Peter Zijlstra2017-03-231-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging this to a variable 'match' totally obscures the code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.554710645@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | locking/atomic/x86: Use atomic_try_cmpxchg()Peter Zijlstra2017-03-232-45/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Better code generation: text data bss name 10665111 4530096 843776 defconfig-build/vmlinux.3 10655703 4530096 843776 defconfig-build/vmlinux.4 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/refcounts: Use atomic_try_cmpxchg()Peter Zijlstra2017-03-231-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generates better code (GCC-6.2.1): text filename 1576 defconfig-build/lib/refcount.o.pre 1488 defconfig-build/lib/refcount.o.post Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/atomic: Introduce atomic_try_cmpxchg()Peter Zijlstra2017-03-234-0/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new cmpxchg interface: bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new); Where the boolean returns the result of the compare; and thus if the exchange happened; and in case of failure, the new value of *ptr is returned in *val. This allows simplification/improvement of loops like: for (;;) { new = val $op $imm; old = cmpxchg(ptr, val, new); if (old == val) break; val = old; } into: do { } while (!try_cmpxchg(ptr, &val, val $op $imm)); while also generating better code (GCC6 and onwards). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/ww_mutex: Improve test to cover acquire context changesPeter Zijlstra2017-03-161-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each thread starts an acquire context only once, and performs all its loop iterations under it. This means that the Wound/Wait relations between threads are fixed. To make things a little more realistic and cover more of the functionality with the test, open a new acquire context for each loop. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/lockdep: Handle statically initialized PER_CPU locks properlyThomas Gleixner2017-03-165-36/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a PER_CPU struct which contains a spin_lock is statically initialized via: DEFINE_PER_CPU(struct foo, bla) = { .lock = __SPIN_LOCK_UNLOCKED(bla.lock) }; then lockdep assigns a seperate key to each lock because the logic for assigning a key to statically initialized locks is to use the address as the key. With per CPU locks the address is obvioulsy different on each CPU. That's wrong, because all locks should have the same key. To solve this the following modifications are required: 1) Extend the is_kernel/module_percpu_addr() functions to hand back the canonical address of the per CPU address, i.e. the per CPU address minus the per CPU offset. 2) Check the lock address with these functions and if the per CPU check matches use the returned canonical address as the lock key, so all per CPU locks have the same key. 3) Move the static_obj(key) check into look_up_lock_class() so this check can be avoided for statically initialized per CPU locks. That's required because the canonical address fails the static_obj(key) check for obvious reasons. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Merged Dan's fixups for !MODULES and !SMP into this patch. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Murphy <dmurphy@ti.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/lockdep: Add new check to lock_downgrade()J. R. Okajima2017-03-163-4/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: f8319483f57f ("locking/lockdep: Provide a type check for lock_is_held") didn't fully cover rwsems as downgrade_write() was left out. Introduce lock_downgrade() and use it to add new checks. See-also: http://marc.info/?l=linux-kernel&m=148581164003149&w=2 Originally-written-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-3-git-send-email-hooanon05g@gmail.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/lockdep: Factor out the validate_held_lock() helper functionJ. R. Okajima2017-03-161-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Behaviour should not change. Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-2-git-send-email-hooanon05g@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | locking/lockdep: Factor out the find_held_lock() helper functionJ. R. Okajima2017-03-161-60/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A simple consolidataion to factor out repeated patterns. The behaviour should not change. Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-1-git-send-email-hooanon05g@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'locking/urgent' into locking/core, to pick up fixesIngo Molnar2017-03-162-14/+24
| |\ \ \ | | | | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/refcount: Add refcount_t API kernel-doc commentsDavid Windsor2017-03-132-12/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Windsor <dwindsor@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: elena.reshetova@intel.com Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/1489160052-20293-1-git-send-email-dwindsor@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2017-05-0129-544/+847
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
| * | | | | sched/x86: Update reschedule warning textPrarit Bhargava2017-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the reschedule warning to output the offline CPU number and use a better debug message. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1492518305-3808-1-git-send-email-prarit@redhat.com [ Tweaked the warning message. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | crypto: N2 - Replace racy task affinity logicThomas Gleixner2017-04-151-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spu_queue_register() needs to invoke setup functions on a particular CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. Replace it by using work_on_cpu_safe() which guarantees to run the code on the requested CPU or to fail in case the CPU is offline. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: linux-crypto@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131019420.2408@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | cpufreq/sparc-us2e: Replace racy task affinity logicThomas Gleixner2017-04-151-24/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access to the HBIRD_ESTAR_MODE register in the cpu frequency control functions must happen on the target CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. Replace it by a straight forward smp function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704131020280.2408@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | cpufreq/sparc-us3: Replace racy task affinity logicThomas Gleixner2017-04-151-30/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access to the safari config register in the CPU frequency functions must be executed on the target CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. Replace it by a straight forward smp function call. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201043.047558840@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | cpufreq/sh: Replace racy task affinity logicThomas Gleixner2017-04-151-18/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The target() callback must run on the affected cpu. This is achieved by temporarily setting the affinity of the calling thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. concurrent affinity settings for that thread resulting in code executing on the wrong CPU. Replace it by work_on_cpu(). All call pathes which invoke the callbacks are already protected against CPU hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.958216363@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | cpufreq/ia64: Replace racy task affinity logicThomas Gleixner2017-04-151-53/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The get() and target() callbacks must run on the affected cpu. This is achieved by temporarily setting the affinity of the calling thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. Replace it by work_on_cpu(). All call pathes which invoke the callbacks are already protected against CPU hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-pm@vger.kernel.org Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1704122231100.2548@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | ACPI/processor: Replace racy task affinity logicThomas Gleixner2017-04-152-27/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | acpi_processor_get_throttling() requires to invoke the getter function on the target CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. acpi_processor_get_throttling() is invoked in two ways: 1) The CPU online callback, which is already running on the target CPU and obviously protected against hotplug and not affected by affinity settings. 2) The ACPI driver probe function, which is not protected against hotplug during modprobe. Switch it over to work_on_cpu() and protect the probe function against CPU hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: linux-acpi@vger.kernel.org Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.785920903@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
OpenPOWER on IntegriCloud