summaryrefslogtreecommitdiffstats
path: root/kernel/cpu.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2015-11-031-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle were: - sched/fair load tracking fixes and cleanups (Byungchul Park) - Make load tracking frequency scale invariant (Dietmar Eggemann) - sched/deadline updates (Juri Lelli) - stop machine fixes, cleanups and enhancements for bugs triggered by CPU hotplug stress testing (Oleg Nesterov) - scheduler preemption code rework: remove PREEMPT_ACTIVE and related cleanups (Peter Zijlstra) - Rework the sched_info::run_delay code to fix races (Peter Zijlstra) - Optimize per entity utilization tracking (Peter Zijlstra) - ... misc other fixes, cleanups and smaller updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop() sched: Start stopper early stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark() stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() stop_machine: Ensure that a queued callback will be called before cpu_stop_park() sched/x86: Fix typo in __switch_to() comments sched/core: Remove a parameter in the migrate_task_rq() function sched/core: Drop unlikely behind BUG_ON() sched/core: Fix task and run queue sched_info::run_delay inconsistencies sched/numa: Fix task_tick_fair() from disabling numa_balancing sched/core: Add preempt_count invariant check sched/core: More notrace annotations sched/core: Kill PREEMPT_ACTIVE sched/core, sched/x86: Kill thread_info::saved_preempt_count sched/core: Simplify preempt_count tests sched/core: Robustify preemption leak checks sched/core: Stop setting PREEMPT_ACTIVE ...
| * sched: Start stopper earlyPeter Zijlstra2015-10-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the stopper thread is active 'early', because the load balancer pretty much assumes that its available. And when 'online && active' the load-balancer is fully available. Not only the numa balancing stop_two_cpus() caller relies on it, but also the self migration stuff does, and at CPU_ONLINE time the cpu really is 'free' to run anything. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160054.GA10176@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce ↵Oleg Nesterov2015-10-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stop_machine_unpark() 1. Change smpboot_unpark_thread() to check ->selfparking, just like smpboot_park_thread() does. 2. Introduce stop_machine_unpark() which sets ->enabled and calls kthread_unpark(). 3. Change smpboot_thread_call() and cpu_stop_init() to call stop_machine_unpark() by hand. This way: - IMO the ->selfparking logic becomes more consistent. - We can kill the smp_hotplug_thread->pre_unpark() method. - We can easily unpark the stopper thread earlier. Say, we can move stop_machine_unpark() from smpboot_thread_call() to sched_cpu_active() as Peter suggests. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * stop_machine: Ensure that a queued callback will be called before ↵Oleg Nesterov2015-10-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_stop_park() cpu_stop_queue_work() checks stopper->enabled before it queues the work, but ->enabled == T can only guarantee cpu_stop_signal_done() if we race with cpu_down(). This is not enough for stop_two_cpus() or stop_machine(), they will deadlock if multi_cpu_stop() won't be called by one of the target CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex. But stop_two_cpus() has to check cpu_active() to avoid the same race with hotplug, and this check is very unobvious and probably not even correct if we race with cpu_up(). Change cpu_down() pass to clear ->enabled before cpu_stopper_thread() flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set. Note also that smpboot_thread_call() calls cpu_stop_unpark() which sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until cpu_stopper_thread() is called at least once. This all means that if cpu_stop_queue_work() succeeds, we know that work->fn() will be called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * cpu/hotplug: Read_lock(tasklist_lock) doesn't need to disable irqsOleg Nesterov2015-09-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_for_tasks() doesn't need to disable irqs, recursive read_lock() from interrupt is fine. While at it, s/do_each_thread/for_each_process_thread/. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@odin.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150910130750.GA20055@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | cpu: Remove try_get_online_cpus()Paul E. McKenney2015-10-071-13/+0
|/ | | | | | | | | | Now that synchronize_sched_expedited() no longer uses it, there are no users of try_get_online_cpus() in mainline. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2015-08-311-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest change in this cycle is the rewrite of the main SMP load balancing metric: the CPU load/utilization. The main goal was to make the metric more precise and more representative - see the changelog of this commit for the gory details: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") It is done in a way that significantly reduces complexity of the code: 5 files changed, 249 insertions(+), 494 deletions(-) and the performance testing results are encouraging. Nevertheless we need to keep an eye on potential regressions, since this potentially affects every SMP workload in existence. This work comes from Yuyang Du. Other changes: - SCHED_DL updates. (Andrea Parri) - Simplify architecture callbacks by removing finish_arch_switch(). (Peter Zijlstra et al) - cputime accounting: guarantee stime + utime == rtime. (Peter Zijlstra) - optimize idle CPU wakeups some more - inspired by Facebook server loads. (Mike Galbraith) - stop_machine fixes and updates. (Oleg Nesterov) - Introduce the 'trace_sched_waking' tracepoint. (Peter Zijlstra) - sched/numa tweaks. (Srikar Dronamraju) - misc fixes and small cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) sched/deadline: Fix comment in enqueue_task_dl() sched/deadline: Fix comment in push_dl_tasks() sched: Change the sched_class::set_cpus_allowed() calling context sched: Make sched_class::set_cpus_allowed() unconditional sched: Fix a race between __kthread_bind() and sched_setaffinity() sched: Ensure a task has a non-normalized vruntime when returning back to CFS sched/numa: Fix NUMA_DIRECT topology identification tile: Reorganize _switch_to() sched, sparc32: Update scheduler comments in copy_thread() sched: Remove finish_arch_switch() sched, tile: Remove finish_arch_switch sched, sh: Fold finish_arch_switch() into switch_to() sched, score: Remove finish_arch_switch() sched, avr32: Remove finish_arch_switch() sched, MIPS: Get rid of finish_arch_switch() sched, arm: Remove finish_arch_switch() sched/fair: Clean up load average references sched/fair: Provide runnable_load_avg back to cfs_rq sched/fair: Remove task and group entity load when they are dead sched/fair: Init cfs_rq's sched_entity load average ...
| * stop_machine: Unexport __stop_machine()Oleg Nesterov2015-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only caller outside of stop_machine.c is _cpu_down(), it can use stop_machine(). get_online_cpus() is fine under cpu_hotplug_begin(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: der.herr@hofr.at Cc: paulmck@linux.vnet.ibm.com Cc: riel@redhat.com Cc: viro@ZenIV.linux.org.uk Link: http://lkml.kernel.org/r/20150630012951.GA23934@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2015-08-311-5/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - the combination of tree geometry-initialization simplifications and OS-jitter-reduction changes to expedited grace periods. These two are stacked due to the large number of conflicts that would otherwise result. - privatize smp_mb__after_unlock_lock(). This commit moves the definition of smp_mb__after_unlock_lock() to kernel/rcu/tree.h, in recognition of the fact that RCU is the only thing using this, that nothing else is likely to use it, and that it is likely to go away completely. - documentation updates. - torture-test updates. - misc fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) rcu,locking: Privatize smp_mb__after_unlock_lock() rcu: Silence lockdep false positive for expedited grace periods rcu: Don't disable CPU hotplug during OOM notifiers scripts: Make checkpatch.pl warn on expedited RCU grace periods rcu: Update MAINTAINERS entry rcu: Clarify CONFIG_RCU_EQS_DEBUG help text rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() rcu: Make rcu_is_watching() really notrace cpu: Wait for RCU grace periods concurrently rcu: Create a synchronize_rcu_mult() rcu: Fix obsolete priority-boosting comment rcu: Use WRITE_ONCE in RCU_INIT_POINTER rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT rcu: Add RCU-sched flavors of get-state and cond-sync rcu: Add fastpath bypassing funnel locking rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS rcu: Pull out wait_event*() condition into helper function documentation: Describe new expedited stall warnings rcu: Add stall warnings to synchronize_sched_expedited() ...
| * \ Merge branch 'for-mingo' of ↵Ingo Molnar2015-08-121-5/+5
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - The combination of tree geometry-initialization simplifications and OS-jitter-reduction changes to expedited grace periods. These two are stacked due to the large number of conflicts that would otherwise result. [ With one addition, a temporary commit to silence a lockdep false positive. Additional changes to the expedited grace-period primitives (queued for 4.4) remove the cause of this false positive, and therefore include a revert of this temporary commit. ] - Documentation updates. - Torture-test updates. - Miscellaneous fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | cpu: Wait for RCU grace periods concurrentlyPaul E. McKenney2015-07-221-5/+5
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | In kernels built with CONFIG_PREEMPT, _cpu_down() waits for RCU and RCU-sched grace periods back-to-back, incurring quite a bit more latency than required. This commit therefore uses the new synchronize_rcu_mult() to allow waiting for both grace periods concurrently. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | Merge tag 'driver-core-4.3-rc1' of ↵Linus Torvalds2015-08-311-8/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the new patches for the driver core / sysfs for 4.3-rc1. Very small number of changes here, all the details are in the shortlog, nothing major happening at all this kernel release, which is nice to see" * tag 'driver-core-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: bus: subsys: update return type of ->remove_dev() to void driver core: correct device's shutdown order driver core: fix docbook for device_private.device selftests: firmware: skip timeout checks for kernels without user mode helper kernel, cpu: Remove bogus __ref annotations cpu: Remove bogus __ref annotation of cpu_subsys_online() firmware: fix wrong memory deallocation in fw_add_devm_name() sysfs.txt: update show method notes about sprintf/snprintf/scnprintf usage devres: fix devres_get()
| * | | kernel, cpu: Remove bogus __ref annotationsMathias Krause2015-08-051-8/+8
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_chain lost its __cpuinitdata annotation long ago in commit 5c113fbeed7a ("fix cpu_chain section mismatch..."). This and the global __cpuinit annotation drop in v3.11 vanished the need to mark all users, including transitive ones, with the __ref annotation. Just get rid of it to not wrongly hide section mismatches. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | cpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disableVitaly Kuznetsov2015-08-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V module needs to disable cpu hotplug (offlining) as there is no support from hypervisor side to reassign already opened event channels to a different CPU. Currently it is been done by altering smp_ops.cpu_disable but it is hackish. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | cpu-hotplug: convert cpu_hotplug_disabled to a counterVitaly Kuznetsov2015-08-051-8/+13
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable functions to modules we need to convert cpu_hotplug_disabled to a counter to properly support disable -> disable -> enable call sequences. E.g. after Hyper-V vmbus module (which is supposed to be the first user of exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable() hibernate path calls disable_nonboot_cpus() and if we hit an error in _cpu_down() enable_nonboot_cpus() will be called on the failure path (thus making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled' state). Same problem is possible if more than 1 module use cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When one of these modules is been unloaded it is logical to leave cpu hotplug in 'disabled' state. To support the change we need to increse cpu_hotplug_disabled counter in disable_nonboot_cpus() unconditionally as all users of disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case an error was returned. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for nowThomas Gleixner2015-07-151-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Boris reported that the sparse_irq protection around __cpu_up() in the generic code causes a regression on Xen. Xen allocates interrupts and some more in the xen_cpu_up() function, so it deadlocks on the sparse_irq_lock. There is no simple fix for this and we really should have the protection for all architectures, but for now the only solution is to move it to x86 where actual wreckage due to the lack of protection has been observed. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down' Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: xen-devel <xen-devel@lists.xenproject.org>
* | hotplug: Prevent alloc/free of irq descriptors during cpu up/downThomas Gleixner2015-07-081-1/+21
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cpu goes up some architectures (e.g. x86) have to walk the irq space to set up the vector space for the cpu. While this needs extra protection at the architecture level we can avoid a few race conditions by preventing the concurrent allocation/free of irq descriptors and the associated data. When a cpu goes down it moves the interrupts which are targeted to this cpu away by reassigning the affinities. While this happens interrupts can be allocated and freed, which opens a can of race conditions in the code which reassignes the affinities because interrupt descriptors might be freed underneath. Example: CPU1 CPU2 cpu_up/down irq_desc = irq_to_desc(irq); remove_from_radix_tree(desc); raw_spin_lock(&desc->lock); free(desc); We could protect the irq descriptors with RCU, but that would require a full tree change of all accesses to interrupt descriptors. But fortunately these kind of race conditions are rather limited to a few things like cpu hotplug. The normal setup/teardown is very well serialized. So the simpler and obvious solution is: Prevent allocation and freeing of interrupt descriptors accross cpu hotplug. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
* cpu: Remove new instance of __cpuinit that crept back inPaul Gortmaker2015-05-271-1/+1
| | | | | | | | | | | | We removed __cpuinit support (leaving no-op stubs) quite some time ago. However a new instance was added in commit 00df35f991914db6b8bde8cf0980 ("cpu: Defer smpboot kthread unparking until CPU known to scheduler") Since we want to clobber the stubs soon, get this removed now. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* cpu: Handle smpboot_unpark_threads() uniformlyPaul E. McKenney2015-05-271-1/+1
| | | | | | | | | | | | | | Commit 00df35f99191 (cpu: Defer smpboot kthread unparking until CPU known to scheduler) put the online path's call to smpboot_unpark_threads() into a CPU-hotplug notifier. This commit places the offline-failure paths call into the same notifier for the sake of uniformity. Note that it is not currently possible to place the offline path's call to smpboot_park_threads() into an existing notifier because the CPU_DYING notifiers run in a restricted environment, and the CPU_UP_PREPARE notifiers run too soon. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2015-04-141-4/+34
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "The main changes in this cycle were: - changes permitting use of call_rcu() and friends very early in boot, for example, before rcu_init() is invoked. - add in-kernel API to enable and disable expediting of normal RCU grace periods. - improve RCU's handling of (hotplug-) outgoing CPUs. - NO_HZ_FULL_SYSIDLE fixes. - tiny-RCU updates to make it more tiny. - documentation updates. - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well cpu: Defer smpboot kthread unparking until CPU known to scheduler rcu: Associate quiescent-state reports with grace period rcu: Yet another fix for preemption and CPU hotplug rcu: Add diagnostics to grace-period cleanup rcutorture: Default to grace-period-initialization delays rcu: Handle outgoing CPUs on exit from idle loop cpu: Make CPU-offline idle-loop transition point more precise rcu: Eliminate ->onoff_mutex from rcu_node structure rcu: Process offlining and onlining only at grace-period start rcu: Move rcu_report_unblock_qs_rnp() to common code rcu: Rework preemptible expedited bitmask handling rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs rcutorture: Enable slow grace-period initializations rcu: Provide diagnostic option to slow down grace-period initialization rcu: Detect stalls caused by failure to propagate up rcu_node tree rcu: Eliminate empty HOTPLUG_CPU ifdef rcu: Simplify sync_rcu_preempt_exp_init() rcu: Put all orphan-callback-related code under same comment rcu: Consolidate offline-CPU callback initialization ...
| * cpu: Defer smpboot kthread unparking until CPU known to schedulerPaul E. McKenney2015-04-131-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, smpboot_unpark_threads() is invoked before the incoming CPU has been added to the scheduler's runqueue structures. This might potentially cause the unparked kthread to run on the wrong CPU, since the correct CPU isn't fully set up yet. That causes a sporadic, hard to debug boot crash triggering on some systems, reported by Borislav Petkov, and bisected down to: 2a442c9c6453 ("x86: Use common outgoing-CPU-notification code") This patch places smpboot_unpark_threads() in a CPU hotplug notifier with priority set so that these kthreads are unparked just after the CPU has been added to the runqueues. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * cpu: Make CPU-offline idle-loop transition point more precisePaul E. McKenney2015-03-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit uses a per-CPU variable to make the CPU-offline code path through the idle loop more precise, so that the outgoing CPU is guaranteed to make it into the idle loop before it is powered off. This commit is in preparation for putting the RCU offline-handling code on this code path, which will eliminate the magic one-jiffy wait that RCU uses as the maximum time for an outgoing CPU to get all the way through the scheduler. The magic one-jiffy wait for incoming CPUs remains a separate issue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | clockevents: Cleanup dead cpu explicitelyThomas Gleixner2015-04-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the cleanup function for a dead cpu and invoke it directly from the cpu down code. Make it conditional on CPU_HOTPLUG as well. Temporary change, will be refined in the future. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased, added clockevents_notify() removal ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | clockevents: Make tick handover explicitThomas Gleixner2015-04-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the tick_handover call and invoke it explicitely from the hotplug code. Temporary solution will be cleaned up in later patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | clockevents: Fix cpu_down() race for hrtimer based broadcastingPreeti U Murthy2015-04-021-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was found when doing a hotplug stress test on POWER, that the machine either hit softlockups or rcu_sched stall warnings. The issue was traced to commit: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") which exposed the cpu_down() race with hrtimer based broadcast mode: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") The race is the following: Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before it is taken down. CPU0 CPU1 cpu_down() take_cpu_down() disable_interrupts() cpu_die() while (CPU1 != CPU_DEAD) { msleep(100); switch_to_idle(); stop_cpu_timer(); schedule_broadcast(); } tick_cleanup_cpu_dead() take_over_broadcast() So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer anymore, so CPU0 will be stuck forever. Fix this by explicitly taking over broadcast duty before cpu_die(). This is a temporary workaround. What we really want is a callback in the clockevent device which allows us to do that from the dying CPU by pushing the hrtimer onto a different cpu. That might involve an IPI and is definitely more complex than this immediate fix. Changelog was picked up from: https://lkml.org/lkml/2015/2/16/213 Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: nicolas.pitre@linaro.org Cc: peterz@infradead.org Cc: rjw@rjwysocki.net Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* hotplugcpu: Avoid deadlocks by waking active_writerDavid Hildenbrand2015-01-061-33/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited grace periods") introduced another problem that can easily be reproduced by starting/stopping cpus in a loop. E.g.: for i in `seq 5000`; do echo 1 > /sys/devices/system/cpu/cpu1/online echo 0 > /sys/devices/system/cpu/cpu1/online done Will result in: INFO: task /cpu_start_stop:1 blocked for more than 120 seconds. Call Trace: ([<00000000006a028e>] __schedule+0x406/0x91c) [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4 [<0000000000130ff6>] _cpu_up+0x3e/0x1c4 [<0000000000131232>] cpu_up+0xb6/0xd4 [<00000000004a5720>] device_online+0x80/0xc0 [<00000000004a57f0>] online_store+0x90/0xb0 ... And a deadlock. Problem is that if the last ref in put_online_cpus() can't get the cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer might never be woken up, therefore never exiting the loop in cpu_hotplug_begin(). This fix removes puts_pending and turns refcount into an atomic variable. We also introduce a wait queue for the active_writer, to avoid possible races and use-after-free. There is no need to take the lock in put_online_cpus() anymore. Can't reproduce it with this fix. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* cpu: Avoid puts_pending overflowPaul E. McKenney2014-11-031-6/+13
| | | | | | | | | | | | | A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
* rcu: More on deadlock between CPU hotplug and expedited grace periodsPaul E. McKenney2014-10-231-1/+13
| | | | | | | | | | | | | | | | | | | Commit dd56af42bd82 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods) was incomplete. Although it did eliminate deadlocks involving synchronize_sched_expedited()'s acquisition of cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar deadlock involving acquisition of this same lock via put_online_cpus(). This deadlock became apparent with testing involving hibernation. This commit therefore changes put_online_cpus() acquisition of this lock to be conditional, and increments a new cpu_hotplug.puts_pending field in case of acquisition failure. Then cpu_hotplug_begin() checks for this new field being non-zero, and applies any changes to cpu_hotplug.refcount. Reported-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Borislav Petkov <bp@suse.de>
* rcu: Eliminate deadlock between CPU hotplug and expedited grace periodsPaul E. McKenney2014-09-181-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>
* sched: Rework check_for_tasks()Kirill Tkhai2014-07-051-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Iterate thru all of threads in the system. Check for all threads, not only for group leaders. 2) Check for p->on_rq instead of p->state and cputime. Preempted task in !TASK_RUNNING state OR just created task may be queued, that we want to be reported too. 3) Use read_lock() instead of write_lock(). This function does not change any structures, and read_lock() is enough. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Segall <bsegall@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Konstantin Khorenko <khorenko@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Todd E Brandt <todd.e.brandt@linux.intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'pm+acpi-3.16-rc1-2' of ↵Linus Torvalds2014-06-121-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: "These are fixups on top of the previous PM+ACPI pull request, regression fixes (ACPI hotplug, cpufreq ppc-corenet), other bug fixes (ACPI reset, cpufreq), new PM trace points for system suspend profiling and a copyright notice update. Specifics: - I didn't remember correctly that the Hans de Goede's ACPI video patches actually didn't flip the video.use_native_backlight default, although we had discussed that and decided to do that. Since I said we would do that in the previous PM+ACPI pull request, make that change for real now. - ACPI bus check notifications for PCI host bridges don't cause the bus below the host bridge to be checked for changes as they should because of a mistake in the ACPI-based PCI hotplug (ACPIPHP) subsystem that forgets to add hotplug contexts to PCI host bridge ACPI device objects. Create hotplug contexts for PCI host bridges too as appropriate. - Revert recent cpufreq commit related to the big.LITTLE cpufreq driver that breaks arm64 builds. - Fix for a regression in the ppc-corenet cpufreq driver introduced during the 3.15 cycle and causing the driver to use the remainder from do_div instead of the quotient. From Ed Swarthout. - Resets triggered by panic activate a BUG_ON() in vmalloc.c on systems where the ACPI reset register is located in memory address space. Fix from Randy Wright. - Fix for a problem with cpufreq governors that decisions made by them may be suboptimal due to the fact that deferrable timers are used by them for CPU load sampling. From Srivatsa S Bhat. - Fix for a problem with the Tegra cpufreq driver where the CPU frequency is temporarily switched to a "stable" level that is different from both the initial and target frequencies during transitions which causes udelay() to expire earlier than it should sometimes. From Viresh Kumar. - New trace points and rework of some existing trace points for system suspend/resume profiling from Todd Brandt. - Assorted cpufreq fixes and cleanups from Stratos Karafotis and Viresh Kumar. - Copyright notice update for suspend-and-cpuhotplug.txt from Srivatsa S Bhat" * tag 'pm+acpi-3.16-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges PM / sleep: trace events for device PM callbacks cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR cpufreq: tegra: update comment for clarity cpufreq: intel_pstate: Remove duplicate CPU ID check cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag PM / Documentation: Update copyright in suspend-and-cpuhotplug.txt cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' cpufreq: governor: Be friendly towards latency-sensitive bursty workloads PM / sleep: trace events for suspend/resume cpufreq: ppc-corenet-cpu-freq: do_div use quotient Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64" cpufreq: Tegra: implement intermediate frequency callbacks cpufreq: add support for intermediate (stable) frequencies ACPI / video: Change the default for video.use_native_backlight to 1 ACPI: Fix bug when ACPI reset register is implemented in system memory
| * Merge branch 'pm-sleep'Rafael J. Wysocki2014-06-121-0/+5
| |\ | | | | | | | | | | | | | | | * pm-sleep: PM / sleep: trace events for device PM callbacks PM / sleep: trace events for suspend/resume
| | * PM / sleep: trace events for suspend/resumeTodd E Brandt2014-06-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds trace events that give finer resolution into suspend/resume. These events are graphed in the timelines generated by the analyze_suspend.py script. They represent large areas of time consumed that are typical to suspend and resume. The event is triggered by calling the function "trace_suspend_resume" with three arguments: a string (the name of the event to be displayed in the timeline), an integer (case specific number, such as the power state or cpu number), and a boolean (where true is used to denote the start of the timeline event, and false to denote the end). The suspend_resume trace event reproduces the data that the machine_suspend trace event did, so the latter has been removed. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | kernel/cpu.c: convert printk to pr_foo()Fabian Frederick2014-06-041-17/+14
|/ / | | | | | | | | | | | | | | | | | | | | | | | | no level printk converted to pr_warn (if err) no level printk converted to pr_info (disabling non-boot cpus) Other printk converted to respective level. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | sched: Fix hotplug vs. set_cpus_allowed_ptr()Lai Jiangshan2014-05-221-2/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lai found that: WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b() ... migration_cpu_stop+0x1d/0x22 was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is always a sub-set of cpu_online_mask. This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness"). So set active and online at the same time to avoid this particular problem. Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness") Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* CPU hotplug: Provide lockless versions of callback registration functionsSrivatsa S. Bhat2014-03-201-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following method of CPU hotplug callback registration is not safe due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock and the cpu_hotplug.lock. get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); The deadlock is shown below: CPU 0 CPU 1 ----- ----- Acquire cpu_hotplug.lock [via get_online_cpus()] CPU online/offline operation takes cpu_add_remove_lock [via cpu_maps_update_begin()] Try to acquire cpu_add_remove_lock [via register_cpu_notifier()] CPU online/offline operation tries to acquire cpu_hotplug.lock [via cpu_hotplug_begin()] *** DEADLOCK! *** The problem here is that callback registration takes the locks in one order whereas the CPU hotplug operations take the same locks in the opposite order. To avoid this issue and to provide a race-free method to register CPU hotplug callbacks (along with initialization of already online CPUs), introduce new variants of the callback registration APIs that simply register the callbacks without holding the cpu_add_remove_lock during the registration. That way, we can avoid the ABBA scenario. However, we will need to hold the cpu_add_remove_lock throughout the entire critical section, to protect updates to the callback/notifier chain. This can be achieved by writing the callback registration code as follows: cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ] for_each_online_cpu(cpu) init_cpu(cpu); /* This doesn't take the cpu_add_remove_lock */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ] Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin() because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD notifiers, and hence get_online_cpus() cannot provide the necessary synchronization to protect the callback/notifier chains against concurrent reads and writes. On the other hand, since the cpu_add_remove_lock protects the entire hotplug operation (including CPU_POST_DEAD), we can use cpu_maps_update_begin/done() to guarantee proper synchronization. Also, since cpu_maps_update_begin/done() is like a super-set of get/put_online_cpus(), the former naturally protects the critical sections from concurrent hotplug operations. Since the names cpu_maps_update_begin/done() don't make much sense in CPU hotplug callback registration scenarios, we'll introduce new APIs named cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done(). In summary, introduce the lockless variants of un/register_cpu_notifier() and also export the cpu_notifier_register_begin/done() APIs for use by modules. This way, we provide a race-free way to register hotplug callbacks as well as perform initialization for the CPUs that are already online. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* CPU hotplug: Add lockdep annotations to get/put_online_cpus()Gautham R. Shenoy2014-03-201-0/+17
| | | | | | | | | | | Add lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin()/cpu_hotplug_end(). Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2013-11-141-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Four bugfixes and one performance fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Avoid integer overflow sched: Optimize task_sched_runtime() sched/numa: Cure update_numa_stats() vs. hotplug sched/numa: Fix NULL pointer dereference in task_numa_migrate() sched: Fix endless sync_sched/rcu() loop inside _cpu_down()
| * sched: Fix endless sync_sched/rcu() loop inside _cpu_down()Michael wang2013-11-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6acce3ef8: sched: Remove get_online_cpus() usage tries to do sync_sched/rcu() inside _cpu_down() but triggers: INFO: task swapper/0:1 blocked for more than 120 seconds. ... [<ffffffff811263dc>] synchronize_rcu+0x2c/0x30 [<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340 ... It was caused by that in the rcu boost case we rely on smpboot thread to finish the rcu callback, which has already been parked before sync in here and leads to the endless sync_sched/rcu(). This patch exchanges the sequence of smpboot_park_threads() and sync_sched/rcu() to fix the bug. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | cpu/mem hotplug: add try_online_node() for cpu_up()Toshi Kani2013-11-131-26/+3
|/ | | | | | | | | | | | | | | | | | | | | | cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched: Remove get_online_cpus() usagePeter Zijlstra2013-10-161-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* ACPI / processor: Acquire writer lock to update CPU mapsToshi Kani2013-08-131-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPU system maps are protected with reader/writer locks. The reader lock, get_online_cpus(), assures that the maps are not updated while holding the lock. The writer lock, cpu_hotplug_begin(), is used to udpate the cpu maps along with cpu_maps_update_begin(). However, the ACPI processor handler updates the cpu maps without holding the the writer lock. acpi_map_lsapic() is called from acpi_processor_hotadd_init() to update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic() is called from acpi_processor_remove() to update cpu_possible_mask. Currently, they are either unprotected or protected with the reader lock, which is not correct. For example, the get_online_cpus() below is supposed to assure that cpu_possible_mask is not changed while the code is iterating with for_each_possible_cpu(). get_online_cpus(); for_each_possible_cpu(cpu) { : } put_online_cpus(); However, this lock has no protection with CPU hotplug since the ACPI processor handler does not use the writer lock when it updates cpu_possible_mask. The reader lock does not serialize within the readers. This patch protects them with the writer lock with cpu_hotplug_begin() along with cpu_maps_update_begin(), which must be held before calling cpu_hotplug_begin(). It also protects arch_register_cpu() / arch_unregister_cpu(), which creates / deletes a sysfs cpu device interface. For this purpose it changes cpu_hotplug_begin() and cpu_hotplug_done() to global and exports them in cpu.h. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* kernel: delete __cpuinit usage from all core kernel filesPaul Gortmaker2013-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* CPU hotplug: provide a generic helper to disable/enable CPU hotplugSrivatsa S. Bhat2013-06-121-32/+23
| | | | | | | | | | | | | | | | | | | | | | | | There are instances in the kernel where we would like to disable CPU hotplug (from sysfs) during some important operation. Today the freezer code depends on this and the code to do it was kinda tailor-made for that. Restructure the code and make it generic enough to be useful for other usecases too. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Robin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds2013-02-191-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull preparatory smp/hotplug patches from Ingo Molnar: "Some early preparatory changes for the WIP hotplug rework by Thomas Gleixner." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Use smpboot threads stop_machine: Store task reference in a separate per cpu variable smpboot: Allow selfparking per cpu threads
| * stop_machine: Use smpboot threadsThomas Gleixner2013-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the smpboot thread infrastructure. Mark the stopper thread selfparking and park it after it has finished the take_cpu_down() work. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | cputime: Use accessors to read task cputime statsFrederic Weisbecker2013-01-271-1/+3
|/ | | | | | | | | | | | | | | | | | | This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'x86-bsp-hotplug-for-linus' of ↵Linus Torvalds2012-12-111-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 BSP hotplug changes from Ingo Molnar: "This tree enables CPU#0 (the boot processor) to be onlined/offlined on x86, just like any other CPU. Enabled on Intel CPUs for now. Allowing this required the identification and fixing of latent CPU#0 assumptions (such as CPU#0 initializations, etc.) in the x86 architecture code, plus the identification of barriers to BSP-offlining, such as active PIC interrupts which can only be serviced on the BSP. It's behind a default-off option, and there's a debug option that allows the automatic testing of this feature. The motivation of this feature is to allow and prepare for true CPU-hotplug hardware support: recent changes to MCE support enable us to detect a deteriorating but not yet hard-failing L1/L2 cache on a CPU that could be soft-unplugged - or a failing L3 cache on a multi-socket system. Note that true hardware hot-plug is not yet fully enabled by this, because that requires a special platform wakeup sequence to be sent to the freshly powered up CPU#0. Future patches for this are planned, once such a platform exists. Chicken and egg" * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, topology: Debug CPU0 hotplug x86/i387.c: Initialize thread xstate only on CPU0 only once x86, hotplug: Handle retrigger irq by the first available CPU x86, hotplug: The first online processor saves the MTRR state x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI x86-32, hotplug: Add start_cpu0() entry point to head_32.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback x86, hotplug, suspend: Online CPU0 for suspend or hibernate x86, hotplug: Support functions for CPU0 online/offline x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it x86, Kconfig: Add config switch for CPU0 hotplug doc: Add x86 CPU0 online/offline feature
| * kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callbackFenghua Yu2012-11-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_hotplug_pm_callback should have higher priority than bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug to avoid race during bsp online checking. This is to hightlight the priorities between the two callbacks in case people may overlook the order. Ideally the priorities should be defined in macro/enum instead of fixed values. To do that, a seperate patchset may be pushed which will touch serveral other generic files and is out of scope of this patchset. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | ACPI / processor: prevent cpu from becoming onlineYasuaki Ishimatsu2012-11-151-3/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if acpi_processor_handle_eject() offlines cpu, there is a chance to online the cpu after that. So the patch closes the window by using get/put_online_cpus(). Why does the patch change _cpu_up() logic? The patch cares the race of hot-remove cpu and _cpu_up(). If the patch does not change it, there is the following race. hot-remove cpu | _cpu_up() ------------------------------------- ------------------------------------ call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | call put_online_cpus() | | start and continue _cpu_up() return acpi_processor_remove() | continue hot-remove the cpu | So _cpu_up() can continue to itself. And hot-remove cpu can also continue itself. If the patch changes _cpu_up() logic, the race disappears as below: hot-remove cpu | _cpu_up() ----------------------------------------------------------------------- call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | cpu's cpu_present is set | to false by set_cpu_present()| call put_online_cpus() | | start _cpu_up() | check cpu_present() and return -EINVAL return acpi_processor_remove() | continue hot-remove the cpu | Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
OpenPOWER on IntegriCloud