summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* sched/preempt, mm/fault: Count pagefault_disable() levels in pagefault_disabledDavid Hildenbrand2015-05-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, pagefault_disable()/pagefault_enabled() used the preempt count to track whether in an environment with pagefaults disabled (can be queried via in_atomic()). This patch introduces a separate counter in task_struct to count the level of pagefault_disable() calls. We'll keep manipulating the preempt count to retain compatibility to existing pagefault handlers. It is now possible to verify whether in a pagefault_disable() envionment by calling pagefault_disabled(). In contrast to in_atomic() it will not be influenced by preempt_enable()/preempt_disable(). This patch is based on a patch from Ingo Molnar. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-2-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/preempt: Optimize preemption operations on __schedule() callersFrederic Weisbecker2015-05-191-20/+9
| | | | | | | | | | | | | | | | | | | | __schedule() disables preemption and some of its callers (the preempt_schedule*() family) also set PREEMPT_ACTIVE. So we have two preempt_count() modifications that could be performed at once. Lets remove the preemption disablement from __schedule() and pull this responsibility to its callers in order to optimize preempt_count() operations in a single place. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431441711-29753-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'v4.1-rc4' into sched/core, before applying new patchesIngo Molnar2015-05-195-19/+52
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * watchdog: Fix merge 'conflict'Peter Zijlstra2015-05-181-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two watchdog changes that came through different trees had a non conflicting conflict, that is, one changed the semantics of a variable but no actual code conflict happened. So the merge appeared fine, but the resulting code did not behave as expected. Commit 195daf665a62 ("watchdog: enable the new user interface of the watchdog mechanism") changes the semantics of watchdog_user_enabled, which thereafter is only used by the functions introduced by b3738d293233 ("watchdog: Add watchdog enable/disable all functions"). There further appears to be a distinct lack of serialization between setting and using watchdog_enabled, so perhaps we should wrap the {en,dis}able_all() things in watchdog_proc_mutex. This patch fixes a s2r failure reported by Michal; which I cannot readily explain. But this does make the code internally consistent again. Reported-and-tested-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2015-05-152-33/+33
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a suspend/resume related regression fix, and an RT priority boosting fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix regression in cpuset_cpu_inactive() for suspend sched: Handle priority boosted tasks proper in setscheduler()
| * \ Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2015-05-151-8/+33
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also a lockdep annotation fix, a PMU event list fix and a new model addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/liblockdep: Fix compilation error tools/liblockdep: Fix linker error in case of cross compile perf tools: Use getconf to determine number of online CPUs tools: Fix tools/vm build perf/x86/rapl: Enable Broadwell-U RAPL support perf/x86/intel: Fix SLM cache event list perf: Annotate inherited event ctx->mutex recursion
| | * | perf: Annotate inherited event ctx->mutex recursionPeter Zijlstra2015-05-081-8/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While fuzzing Sasha tripped over another ctx->mutex recursion lockdep splat. Annotate this. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2015-05-091-0/+1
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Two patches from the irq departement: - a simple fix to make dummy_irq_chip usable for wakeup scenarios - removal of the gic arch_extn hackery. Now that all users are converted we really want to get rid of the interface so people wont come up with new use cases" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic: Drop support for gic_arch_extn genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip
| | * | | genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chipRoger Quadros2015-04-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this system suspend is broken on systems that have drivers calling enable/disable_irq_wake() for interrupts based off the dummy irq hook. (e.g. drivers/gpio/gpio-pcf857x.c) Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: <cw00.choi@samsung.com> Cc: <balbi@ti.com> Cc: <tony@atomide.com> Cc: Gregory Clement <gregory.clement@free-electrons.com> Link: http://lkml.kernel.org/r/552E1DD3.4040106@ti.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2015-05-091-5/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A simple fix to actually shut down a detached device instead of keeping it active" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clockevents: Shutdown detached clockevent device
| | * | | | clockevents: Shutdown detached clockevent deviceViresh Kumar2015-04-241-5/+1
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A clockevent device is marked DETACHED when it is replaced by another clockevent device. The device is shutdown properly for drivers that implement legacy ->set_mode() callback, as we call ->set_mode() for CLOCK_EVT_MODE_UNUSED as well. But for the new per-state callback interface, we skip shutting down the device, as we thought its an internal state change. That wasn't correct. The effect is that the device is left programmed in oneshot or periodic mode. Fall-back to 'case CLOCK_EVT_STATE_SHUTDOWN', to shutdown the device. Fixes: bd624d75db21 "clockevents: Introduce mode specific callbacks" Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/eef0a91c51b74d4e52c8e5a95eca27b5a0563f07.1428650683.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge tag 'trace-fixes-v4.1-rc2' of ↵Linus Torvalds2015-05-081-1/+2
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "The newly added ftrace_print_array_seq() function had a bug in it. Luckily, the only user of it didn't make the 4.1 merge window. But the helper function should be fixed before 4.2 when the users start coming in" * tag 'trace-fixes-v4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Make ftrace_print_array_seq compute buf_len
| | * | | tracing: Make ftrace_print_array_seq compute buf_lenAlex Bennée2015-05-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only caller to this function (__print_array) was getting it wrong by passing the array length instead of buffer length. As the element size was already being passed for other reasons it seems reasonable to push the calculation of buffer length into the function. Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | sched: Fix function declaration return type mismatchNicholas Mc Guire2015-05-171-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | static code checking was unhappy with: ./kernel/sched/fair.c:162 WARNING: return of wrong type int != unsigned int get_update_sysctl_factor() is declared to return int but is currently returning an unsigned int. The first few preprocessed lines are: static int get_update_sysctl_factor(void) { unsigned int cpus = ({ int __min1 = (cpumask_weight(cpu_online_mask)); int __min2 = (8); __min1 < __min2 ? __min1: __min2; }); unsigned int factor; The type used by min_t() should be 'unsigned int' and the return type of get_update_sysctl_factor() should also be 'unsigned int' as its call-site update_sysctl() is expecting 'unsigned int' and the values utilizing: 'factor' 'sysctl_sched_min_granularity' 'sched_nr_latency' 'sysctl_sched_wakeup_granularity' ... are also all 'unsigned int', plus cpumask_weight() is also returning 'unsigned int'. So the natural type to use around here is 'unsigned int'. ( Patch was compile tested with x86_64_defconfig + CONFIG_SCHED_DEBUG=y and the changed sections in kernel/sched/fair.i were reviewed. ) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> [ Improved the changelog a bit. ] Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431716742-11077-1-git-send-email-hofrat@osadl.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | futex: Implement lockless wakeupsDavidlohr Bueso2015-05-081-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given the overall futex architecture, any chance of reducing hb->lock contention is welcome. In this particular case, using wake-queues to enable lockless wakeups addresses very much real world performance concerns, even cases of soft-lockups in cases of large amounts of blocked tasks (which is not hard to find in large boxes, using but just a handful of futex). At the lowest level, this patch can reduce latency of a single thread attempting to acquire hb->lock in highly contended scenarios by a up to 2x. At lower counts of nr_wake there are no regressions, confirming, of course, that the wake_q handling overhead is practically non existent. For instance, while a fair amount of variation, the extended pef-bench wakeup benchmark shows for a 20 core machine the following avg per-thread time to wakeup its share of tasks: nr_thr ms-before ms-after 16 0.0590 0.0215 32 0.0396 0.0220 48 0.0417 0.0182 64 0.0536 0.0236 80 0.0414 0.0097 96 0.0672 0.0152 Naturally, this can cause spurious wakeups. However there is no core code that cannot handle them afaict, and furthermore tglx does have the point that other events can already trigger them anyway. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: George Spelvin <linux@horizon.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched: Implement lockless wake-queuesPeter Zijlstra2015-05-081-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is useful for locking primitives that can effect multiple wakeups per operation and want to avoid lock internal lock contention by delaying the wakeups until we've released the lock internal locks. Alternatively it can be used to avoid issuing multiple wakeups, and thus save a few cycles, in packet processing. Queue all target tasks and wakeup once you've processed all packets. That way you avoid waking the target task multiple times if there were multiple packets for the same task. Properties of a wake_q are: - Lockless, as queue head must reside on the stack. - Being a queue, maintains wakeup order passed by the callers. This can be important for otherwise, in scenarios where highly contended locks could affect any reliance on lock fairness. - A queued task cannot be added again until it is woken up. This patch adds the needed infrastructure into the scheduler code and uses the new wake_list to delay the futex wakeups until after we've released the hash bucket locks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [tweaks, adjustments, comments, etc.] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: George Spelvin <linux@horizon.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430494072-30283-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched, timer: Use the atomic task_cputime in thread_group_cputimerJason Low2015-05-082-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent optimizations were made to thread_group_cputimer to improve its scalability by keeping track of cputime stats without a lock. However, the values were open coded to the structure, causing them to be at a different abstraction level from the regular task_cputime structure. Furthermore, any subsequent similar optimizations would not be able to share the new code, since they are specific to thread_group_cputimer. This patch adds the new task_cputime_atomic data structure (introduced in the previous patch in the series) to thread_group_cputimer for keeping track of the cputime atomically, which also helps generalize the code. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1430251224-5764-6-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to ↵Jason Low2015-05-083-42/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improve scalability While running a database workload, we found a scalability issue with itimers. Much of the problem was caused by the thread_group_cputimer spinlock. Each time we account for group system/user time, we need to obtain a thread_group_cputimer's spinlock to update the timers. On larger systems (such as a 16 socket machine), this caused more than 30% of total time spent trying to obtain this kernel lock to update these group timer stats. This patch converts the timers to 64-bit atomic variables and use atomic add to update them without a lock. With this patch, the percent of total time spent updating thread group cputimer timers was reduced from 30% down to less than 1%. Note: On 32-bit systems using the generic 64-bit atomics, this causes sample_group_cputimer() to take locks 3 times instead of just 1 time. However, we tested this patch on a 32-bit system ARM system using the generic atomics and did not find the overhead to be much of an issue. An explanation for why this isn't an issue is that 32-bit systems usually have small numbers of CPUs, and cacheline contention from extra spinlocks called periodically is not really apparent on smaller systems. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched/numa: Document usages of mm->numa_scan_seqJason Low2015-05-081-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The p->mm->numa_scan_seq is accessed using READ_ONCE/WRITE_ONCE and modified without exclusive access. It is not clear why it is accessed this way. This patch provides some documentation on that. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <waiman.long@hp.com> Link: http://lkml.kernel.org/r/1430440094.2475.61.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to ↵Jason Low2015-05-0811-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | READ_ONCE()/WRITE_ONCE() ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use the new READ_ONCE() and WRITE_ONCE() APIs as appropriate. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched/core: Remove unnecessary down/up conversionNicholas Mc Guire2015-05-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'rt_period_us' is automatically type converted from u64 to long and then cast back to u64 - this down/up conversion is unnecessary and can be removed to improve readability. This will also help us not truncate 'rt_period_us' to 32 bits on 32-bit kernels, should we ever have so large values. (unlikely, not the least due to procfs.) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430643116-24049-1-git-send-email-hofrat@osadl.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | signals, sched: Change all uses of JOBCTL_* from 'int' to 'long'Palmer Dabbelt2015-05-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()") makes jobctl an "unsigned long". It makes sense to have the masks applied to it match that type. This is currently just a cosmetic change, but it will prevent the mask from being unexpectedly truncated if we ever end up with masks with more bits. One instance of "signr" is an int, but I left this alone because the mask ensures that it will never overflow. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bobby.prani@gmail.com Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: richard@nod.at Cc: vdavydov@parallels.com Link: http://lkml.kernel.org/r/1430453997-32459-4-git-send-email-palmer@dabbelt.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | sched: Move the loadavg code to a more obvious locationPeter Zijlstra2015-05-085-219/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I could not find the loadavg code.. turns out it was hidden in a file called proc.c. It further got mingled up with the cruft per rq load indexes (which we really want to get rid of). Move the per rq load indexes into the fair.c load-balance code (that's the only thing that uses them) and rename proc.c to loadavg.c so we can find it again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> [ Did minor cleanups to the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'sched/urgent' into sched/core, before applying new patchesIngo Molnar2015-05-082-33/+33
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched/core: Fix regression in cpuset_cpu_inactive() for suspendOmar Sandoval2015-05-081-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that caused a regression in setting a CPU inactive during suspend. I ran into this when a program was failing pthread_setaffinity_np() with EINVAL after a suspend+wake up. A simple reproducer: $ ./a.out sched_setaffinity: Success $ systemctl suspend $ ./a.out sched_setaffinity: Invalid argument ... where ./a.out is: #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> int main(void) { long num_cores; cpu_set_t cpu_set; int ret; num_cores = sysconf(_SC_NPROCESSORS_ONLN); CPU_ZERO(&cpu_set); CPU_SET(num_cores - 1, &cpu_set); errno = 0; ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set); perror("sched_setaffinity"); return ret ? EXIT_FAILURE : EXIT_SUCCESS; } The mistake is that suspend is handled in the action == CPU_DOWN_PREPARE_FROZEN case of the switch statement in cpuset_cpu_inactive(). However, the commit in question masked out CPU_TASKS_FROZEN from the action, making this case dead. The fix is straightforward. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()") Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | sched: Handle priority boosted tasks proper in setscheduler()Thomas Gleixner2015-05-082-17/+21
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | sched/core: Remove __cpuinit section tag that crept back inPaul Gortmaker2015-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We removed __cpuinit support (leaving no-op stubs) quite some time ago. However this one crept back in as of commit a803f0261bb2bb57aab ("sched: Initialize rq->age_stamp on processor start") Since we want to clobber the stubs too, get this removed now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Minyard <cminyard@mvista.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430174880-27958-2-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'sched/urgent' into sched/coreIngo Molnar2015-05-081-4/+0
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | So this isn't really a fix but a cleanup that can wait for v4.2. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | sched/autogroup: Remove unnecessary #ifdef guardsTobias Klauser2015-04-161-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit: 029632fbb7b7 ("sched: Make separate sched*.c translation units") autogroup is a separate translation unit built from the Makefile and thus no longer needs its content wrapped with #ifdef CONFIG_SCHED_AUTOGROUP. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1429116678-17000-1-git-send-email-tklauser@distanz.ch Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2015-05-061-7/+9
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "An RCU Kconfig fix that eliminates an annoying interactive kconfig question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Control grace-period delays directly from value
| * | | Merge branch 'for-mingo' of ↵Ingo Molnar2015-04-181-7/+9
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney: "This series contains a single change that fixes Kconfig asking pointless questions." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | rcu: Control grace-period delays directly from valuePaul E. McKenney2015-04-141-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a misguided attempt to avoid an #ifdef, the use of the gp_init_delay module parameter was conditioned on the corresponding RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at the point of use in the code. This meant that the compiler always saw the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be unconditionally defined. This in turn caused "make oldconfig" to ask pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY in cases where it was not even used. This commit avoids these pointless questions by defining gp_init_delay under #ifdef. In one branch, gp_init_delay is initialized to RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus allowing boot-time modification), and in the other branch gp_init_delay is a const variable initialized by default to zero. This approach also simplifies the code at the delay point by eliminating the IS_DEFINED(). Because gp_init_delay is constant zero in the no-delay case intended for production use, the "gp_init_delay > 0" check causes the delay to become dead code, as desired in this case. In addition, this commit replaces magic constant "10" with the preprocessor variable PER_RCU_NODE_PERIOD, which controls the number of grace periods that are allowed to elapse at full speed before a delay is inserted. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-05-011-6/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Receive packet length needs to be adjust by 2 on RX to accomodate the two padding bytes in altera_tse driver. From Vlastimil Setka. 2) If rx frame is dropped due to out of memory in macb driver, we leave the receive ring descriptors in an undefined state. From Punnaiah Choudary Kalluri 3) Some netlink subsystems erroneously signal NLM_F_MULTI. That is only for dumps. Fix from Nicolas Dichtel. 4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go via the ipv4_mtu() helper. From Herbert Xu. 5) Fix null deref in bridge netfilter, and miscalculated lengths in jump/goto nf_tables verdicts. From Florian Westphal. 6) Unhash ping sockets properly. 7) Software implementation of BPF divide did 64/32 rather than 64/64 bit divide. The JITs got it right. Fix from Alexei Starovoitov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) ipv4: Missing sk_nulls_node_init() in ping_unhash(). net: fec: Fix RGMII-ID mode net/mlx4_en: Schedule napi when RX buffers allocation fails netxen_nic: use spin_[un]lock_bh around tx_clean_lock net/mlx4_core: Fix unaligned accesses mlx4_en: Use correct loop cursor in error path. cxgb4: Fix MC1 memory offset calculation bnx2x: Delay during kdump load net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table net: dsa: Fix scope of eeprom-length property net: macb: Fix race condition in driver when Rx frame is dropped hv_netvsc: Fix a bug in netvsc_start_xmit() altera_tse: Correct rx packet length mlx4: Fix tx ring affinity_mask creation tipc: fix problem with parallel link synchronization mechanism tipc: remove wrong use of NLM_F_MULTI bridge/nl: remove wrong use of NLM_F_MULTI bridge/mdb: remove wrong use of NLM_F_MULTI net: sched: act_connmark: don't zap skb->nfct trivial: net: systemport: bcmsysport.h: fix 0x0x prefix ...
| * | | | | bpf: fix 64-bit divideAlexei Starovoitov2015-04-271-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ALU64_DIV instruction should be dividing 64-bit by 64-bit, whereas do_div() does 64-bit by 32-bit divide. x64 and arm64 JITs correctly implement 64 by 64 unsigned divide. llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge tag 'pm+acpi-4.1-rc2' of ↵Linus Torvalds2015-04-301-14/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "Three regression fixes this time, one for a recent regression in the cpuidle core affecting multiple systems, one for an inadvertently added duplicate typedef in ACPICA that breaks compilation with GCC 4.5 and one for an ACPI Smart Battery Subsystem driver regression introduced during the 3.18 cycle (stable-candidate). Specifics: - Fix for a regression in the cpuidle core introduced by one of the recent commits in the clockevents_notify() removal series that put a call to a function which had to be executed with disabled interrupts into a code path running with enabled interrupts (Rafael J Wysocki) - Fix for a build problem in ACPICA (with GCC 4.5) introduced by one of the recent ACPICA tools commits that added a duplicate typedef to one of the ACPICA's header files by mistake (Olaf Hering) - Fix for a regression in the ACPI SBS (Smart Battery Subsystem) driver introduced during the 3.18 development cycle causing the smart battery manager to be marked as not present when it should be marked as present (Chris Bainbridge)" * tag 'pm+acpi-4.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: Run tick_broadcast_exit() with disabled interrupts ACPI / SBS: Enable battery manager when present ACPICA: remove duplicate u8 typedef
| * | | | | | cpuidle: Run tick_broadcast_exit() with disabled interruptsRafael J. Wysocki2015-04-291-14/+2
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 335f49196fd6 (sched/idle: Use explicit broadcast oneshot control function) replaced clockevents_notify() invocations in cpuidle_idle_call() with direct calls to tick_broadcast_enter() and tick_broadcast_exit(), but it overlooked the fact that interrupts were already enabled before calling the latter which led to functional breakage on systems using idle states with the CPUIDLE_FLAG_TIMER_STOP flag set. Fix that by moving the invocations of tick_broadcast_enter() and tick_broadcast_exit() down into cpuidle_enter_state() where interrupts are still disabled when tick_broadcast_exit() is called. Also ensure that interrupts will be disabled before running tick_broadcast_exit() even if they have been enabled by the idle state's ->enter callback. Trigger a WARN_ON_ONCE() in that case, as we generally don't want that to happen for states with CPUIDLE_FLAG_TIMER_STOP set. Fixes: 335f49196fd6 (sched/idle: Use explicit broadcast oneshot control function) Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-04-301-15/+0
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm changes from Paolo Bonzini: "Remove from guest code the handling of task migration during a pvclock read; instead use the correct protocol in KVM. This removes the need for task migration notifiers in core scheduler code" [ The scheduler people really hated the migration notifiers, so this was kind of required - Linus ] * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86: pvclock: Really remove the sched notifier for cross-cpu migrations kvm: x86: fix kvmclock update protocol
| * | | | | | x86: pvclock: Really remove the sched notifier for cross-cpu migrationsPaolo Bonzini2015-04-271-15/+0
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27 and 80f7fdb1c7f0f9266421f823964fd1962681f6ce. The task migration notifier was originally introduced in order to support the pvclock vsyscall with non-synchronized TSC, but KVM only supports it with synchronized TSC. Hence, on KVM the race condition is only needed due to a bad implementation on the host side, and even then it's so rare that it's mostly theoretical. As far as KVM is concerned it's possible to fix the host, avoiding the additional complexity in the vDSO and the (re)introduction of the task migration notifier. Xen, on the other hand, hasn't yet implemented vsyscall support at all, so we do not care about its plans for non-synchronized TSC. Reported-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | modsign: change default key detailsDavid Howells2015-04-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change default key details to be more obviously unspecified. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2015-04-281-1/+1
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "One additional new feature for 4.1, a new PRNG based on SHA-512 for the zcrypt driver. Two memory management related changes, the page table reallocation for KVM is removed, and with file ptes gone the encoding of page table entries is improved. And three bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Introduce new SHA-512 based Pseudo Random Generator. s390/mm: change swap pte encoding and pgtable cleanup s390/mm: correct transfer of dirty & young bits in __pmd_to_pte s390/bpf: add dependency to z196 features s390/3215: free memory in error path s390/kvm: remove delayed reallocation of page tables for KVM kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP
| * | | | | kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFPMartin Schwidefsky2015-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code to override the gfp flags of the allocation for the kexec control page. The loop in kimage_alloc_normal_control_pages allocates pages with GFP_KERNEL until a page is found that happens to have an address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT the loop will keep allocating memory until the oom killer steps in. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2015-04-268-19/+19
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
| * | | | | VFS: kernel/: d_inode() annotationsDavid Howells2015-04-154-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | relayfs and tracefs are dealing with inodes of their own; those two act as filesystem drivers Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | VFS: audit: d_backing_inode() annotationsDavid Howells2015-04-154-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds2015-04-224-53/+94
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull audit fixes from Paul Moore: "Seven audit patches for v4.1, all bug fixes. The largest, and perhaps most significant commit helps resolve some memory pressure issues related to the inode cache and audit, there are also a few small commits which help resolve some timing issues with the audit log queue, and the rest fall into the always popular "code clean-up" category. In general, nothing really substantial, just a nice set of maintenance patches" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: Remove condition which always evaluates to false audit: reduce mmap_sem hold for mm->exe_file audit: consolidate handling of mm->exe_file audit: code clean up audit: don't reset working wait time accidentally with auditd audit: don't lose set wait time on first successful call to audit_log_start() audit: move the tree pruning to a dedicated thread
| * | | | | | audit: Remove condition which always evaluates to falsePranith Kumar2015-03-131-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 3e1d0bb6224f019893d1c498cc3327559d183674 ("audit: Convert int limit uses to u32"), by converting an int to u32, few conditions will always evaluate to false. These warnings were emitted during compilation: kernel/audit.c: In function ‘audit_set_enabled’: kernel/audit.c:347:2: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (state < AUDIT_OFF || state > AUDIT_LOCKED) ^ kernel/audit.c: In function ‘audit_receive_msg’: kernel/audit.c:880:9: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (s.backlog_wait_time < 0 || The following patch removes those unnecessary conditions. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | audit: reduce mmap_sem hold for mm->exe_fileDavidlohr Bueso2015-02-231-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mm->exe_file is currently serialized with mmap_sem (shared) in order to both safely (1) read the file and (2) audit it via audit_log_d_path(). Good users will, on the other hand, make use of the more standard get_mm_exe_file(), requiring only holding the mmap_sem to read the value, and relying on reference counting to make sure that the exe file won't dissapear underneath us. Additionally, upon NULL return of get_mm_exe_file, we also call audit_log_format(ab, " exe=(null)"). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | audit: consolidate handling of mm->exe_fileDavidlohr Bueso2015-02-233-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a audit_log_d_path_exe() helper function to share how we handle auditing of the exe_file's path. Used by both audit and auditsc. No functionality is changed. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | audit: code clean upAmeen Ali2015-02-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed a coding style issue (unnecessary parentheses , unnecessary braces) Signed-off-by: Ameen-Ali <Ameenali023@gmail.com> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | audit: don't reset working wait time accidentally with auditdRichard Guy Briggs2015-02-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During a queue overflow condition while we are waiting for auditd to drain the queue to make room for regular messages, we don't want a successful auditd that has bypassed the queue check to reset the backlog wait time. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
OpenPOWER on IntegriCloud