summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | | params: suppress unused variable error, warn once just in case code changes.Rusty Russell2015-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It shouldn't fail due to OOM (it's boot time), and already warns if we get two identical names. But you never know what the future holds, and WARN_ON_ONCE() keeps gcc happy with minimal code. Reported-by: Louis Langholtz <lou_langholtz@me.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/module.c: avoid ifdefs for sig_enforce declarationLuis R. Rodriguez2015-05-281-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to require an ifdef over the declaration of sig_enforce as IS_ENABLED() can be used. While at it, there's no harm in exposing this kernel parameter outside of CONFIG_MODULE_SIG as it'd be a no-op on non module sig kernels. Now, technically we should in theory be able to remove the #ifdef'ery over the declaration of the module parameter as we are also trusting the bool_enable_only code for CONFIG_MODULE_SIG kernels but for now remain paranoid and keep it. With time if no one can put a bullet through bool_enable_only and if there are no technical requirements over not exposing CONFIG_MODULE_SIG_FORCE with the measures in place by bool_enable_only we could remove this last ifdef. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/workqueue.c: remove ifdefs over wq_power_efficientLuis R. Rodriguez2015-05-281-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can avoid an ifdef over wq_power_efficient's declaration by just using IS_ENABLED(). Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/params.c: export param_ops_bool_enable_onlyLuis R. Rodriguez2015-05-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will grant access to this helper to code built as modules. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Howells <dhowells@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Kyle McMartin <kyle@kernel.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/params.c: generalize bool_enable_onlyLuis R. Rodriguez2015-05-282-31/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This takes out the bool_enable_only implementation from the module loading code and generalizes it so that others can make use of it. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/module.c: use generic module param operaters for sig_enforceLuis R. Rodriguez2015-05-281-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're directly checking and modifying sig_enforce when needed instead of using the generic helpers. This prevents us from generalizing this helper so that others can use it. Use indirect helpers to allow us to generalize this code a bit and to make it a bit more clear what this is doing. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: cocci@systeme.lip6.fr Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | kernel/params: constify struct kernel_param_ops usesLuis R. Rodriguez2015-05-281-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most code already uses consts for the struct kernel_param_ops, sweep the kernel for the last offending stragglers. Other than include/linux/moduleparam.h and kernel/params.c all other changes were generated with the following Coccinelle SmPL patch. Merge conflicts between trees can be handled with Coccinelle. In the future git could get Coccinelle merge support to deal with patch --> fail --> grammar --> Coccinelle --> new patch conflicts automatically for us on patches where the grammar is available and the patch is of high confidence. Consider this a feature request. Test compiled on x86_64 against: * allnoconfig * allmodconfig * allyesconfig @ const_found @ identifier ops; @@ const struct kernel_param_ops ops = { }; @ const_not_found depends on !const_found @ identifier ops; @@ -struct kernel_param_ops ops = { +const struct kernel_param_ops ops = { }; Generated-by: Coccinelle SmPL Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Junio C Hamano <gitster@pobox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: cocci@systeme.lip6.fr Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Rework module_addr_{min,max}Peter Zijlstra2015-05-281-28/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __module_address() does an initial bound check before doing the {list/tree} iteration to find the actual module. The bound variables are nowhere near the mod_tree cacheline, in fact they're nowhere near one another. module_addr_min lives in .data while module_addr_max lives in .bss (smarty pants GCC thinks the explicit 0 assignment is a mistake). Rectify this by moving the two variables into a structure together with the latch_tree_root to guarantee they all share the same cacheline and avoid hitting two extra cachelines for the lookup. While reworking the bounds code, move the bound update from allocation to insertion time, this avoids updating the bounds for a few error paths. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Use __module_address() for module_address_lookup()Peter Zijlstra2015-05-281-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the generic __module_address() addr to struct module lookup instead of open coding it once more. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACINGPeter Zijlstra2015-05-281-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrew worried about the overhead on small systems; only use the fancy code when either perf or tracing is enabled. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Requested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Optimize __module_address() using a latched RB-treePeter Zijlstra2015-05-281-5/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently __module_address() is using a linear search through all modules in order to find the module corresponding to the provided address. With a lot of modules this can take a lot of time. One of the users of this is kernel_text_address() which is employed in many stack unwinders; which in turn are used by perf-callchain and ftrace (possibly from NMI context). So by optimizing __module_address() we optimize many stack unwinders which are used by both perf and tracing in performance sensitive code. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | seqlock: Introduce raw_read_seqcount_latch()Peter Zijlstra2015-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because with latches there is a strict data dependency on the seq load we can avoid the rmb in favour of a read_barrier_depends. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | seqlock: Better document raw_write_seqcount_latch()Peter Zijlstra2015-05-281-26/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the documentation of the latch technique as used in the current timekeeping code, such that it can be readily employed elsewhere. Borrow from the comments in timekeeping and replace those with a reference to this more generic comment. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Michel Lespinasse <walken@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Sanitize RCU usage and lockingPeter Zijlstra2015-05-281-8/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the RCU usage in module is an inconsistent mess of RCU and RCU-sched, this is broken for CONFIG_PREEMPT where synchronize_rcu() does not imply synchronize_sched(). Most usage sites use preempt_{dis,en}able() which is RCU-sched, but (most of) the modification sites use synchronize_rcu(). With the exception of the module bug list, which actually uses RCU. Convert everything over to RCU-sched. Furthermore add lockdep asserts to all sites, because it's not at all clear to me the required locking is observed, esp. on exported functions. Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module, jump_label: Fix module lockingPeter Zijlstra2015-05-271-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per the module core lockdep annotations in the coming patch: [ 18.034047] ---[ end trace 9294429076a9c673 ]--- [ 18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 18.059228] ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001 [ 18.067541] 0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68 [ 18.075851] ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8 [ 18.084173] Call Trace: [ 18.086906] [<ffffffff8157e98b>] dump_stack+0x4f/0x7b [ 18.092649] [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0 [ 18.099361] [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20 [ 18.105880] [<ffffffff810ee502>] __module_address+0x1d2/0x1e0 [ 18.112400] [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0 [ 18.119710] [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70 [ 18.126326] [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90 [ 18.134009] [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20 [ 18.141490] [<ffffffff810f0f00>] load_module+0x1b50/0x2660 [ 18.147720] [<ffffffff810f1ade>] SyS_init_module+0xce/0x100 [ 18.154045] [<ffffffff81587429>] system_call_fastpath+0x12/0x17 [ 18.160748] ---[ end trace 9294429076a9c674 ]--- Jump labels is not doing it right; fix this. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jason Baron <jbaron@akamai.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | | | | | | | | module: Annotate module version magicPeter Zijlstra2015-05-271-3/+9
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the new lockdep checks in the coming patch, we go: [ 9.759380] ------------[ cut here ]------------ [ 9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130() [ 9.759391] Modules linked in: [ 9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65 [ 9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 9.759396] ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001 [ 9.759398] 0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8 [ 9.759400] 0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000 [ 9.759400] Call Trace: [ 9.759407] [<ffffffff8157e98b>] dump_stack+0x4f/0x7b [ 9.759410] [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0 [ 9.759412] [<ffffffff810ec160>] ? section_objs+0x60/0x60 [ 9.759414] [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20 [ 9.759415] [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130 [ 9.759417] [<ffffffff810eda01>] find_symbol+0x31/0x70 [ 9.759420] [<ffffffff810ef5bf>] load_module+0x20f/0x2660 [ 9.759422] [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0 [ 9.759426] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759427] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759433] [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0 [ 9.759437] [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 9.759439] [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13 [ 9.759441] [<ffffffff810f1ade>] SyS_init_module+0xce/0x100 [ 9.759443] [<ffffffff81587429>] system_call_fastpath+0x12/0x17 [ 9.759445] ---[ end trace 9294429076a9c644 ]--- As per the comment this site should be fine, but lets wrap it in preempt_disable() anyhow to placate lockdep. Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | | | | | | Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds2015-06-272-5/+3
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull audit updates from Paul Moore: "Four small audit patches for v4.2, all bug fixes. Only 10 lines of change this time so very unremarkable, the patch subject lines pretty much tell the whole story" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: audit: Fix check of return value of strnlen_user() audit: obsolete audit_context check is removed in audit_filter_rules() audit: fix for typo in comment to function audit_log_link_denied() lsm: rename duplicate labels in LSM_AUDIT_DATA_TASK audit message type
| * | | | | | | | | | audit: Fix check of return value of strnlen_user()Jan Kara2015-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strnlen_user() returns 0 when it hits fault, not -1. Fix the test in audit_log_single_execve_arg(). Luckily this shouldn't ever happen unless there's a kernel bug so it's mostly a cosmetic fix. CC: Paul Moore <pmoore@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | | | | | audit: obsolete audit_context check is removed in audit_filter_rules()Mikhail Klementyev2015-05-291-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mikhail Klementyev <jollheef@riseup.net> [PM: patch applied by hand due to HTML mangling, rewrote subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
| * | | | | | | | | | audit: fix for typo in comment to function audit_log_link_denied()Shailendra Verma2015-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
* | | | | | | | | | | Merge branch 'next' of ↵Linus Torvalds2015-06-271-9/+4
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "The main change in this kernel is Casey's generalized LSM stacking work, which removes the hard-coding of Capabilities and Yama stacking, allowing multiple arbitrary "small" LSMs to be stacked with a default monolithic module (e.g. SELinux, Smack, AppArmor). See https://lwn.net/Articles/636056/ This will allow smaller, simpler LSMs to be incorporated into the mainline kernel and arbitrarily stacked by users. Also, this is a useful cleanup of the LSM code in its own right" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits) tpm, tpm_crb: fix le64_to_cpu conversions in crb_acpi_add() vTPM: set virtual device before passing to ibmvtpm_reset_crq tpm_ibmvtpm: remove unneccessary message level. ima: update builtin policies ima: extend "mask" policy matching support ima: add support for new "euid" policy condition ima: fix ima_show_template_data_ascii() Smack: freeing an error pointer in smk_write_revoke_subj() selinux: fix setting of security labels on NFS selinux: Remove unused permission definitions selinux: enable genfscon labeling for sysfs and pstore files selinux: enable per-file labeling for debugfs files. selinux: update netlink socket classes signals: don't abuse __flush_signals() in selinux_bprm_committed_creds() selinux: Print 'sclass' as string when unrecognized netlink message occurs Smack: allow multiple labels in onlycap Smack: fix seq operations in smackfs ima: pass iint to ima_add_violation() ima: wrap event related data to the new ima_event_data structure integrity: add validity checks for 'path' parameter ...
| * | | | | | | | | | | signals: don't abuse __flush_signals() in selinux_bprm_committed_creds()Oleg Nesterov2015-06-041-9/+4
| | |_|_|_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | selinux_bprm_committed_creds()->__flush_signals() is not right, we shouldn't clear TIF_SIGPENDING unconditionally. There can be other reasons for signal_pending(): freezing(), JOBCTL_PENDING_MASK, and potentially more. Also change this code to check fatal_signal_pending() rather than SIGNAL_GROUP_EXIT, it looks a bit better. Now we can kill __flush_signals() before it finds another buggy user. Note: this code looks racy, we can flush a signal which was sent after the task SID has been updated. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
* | | | | | | | | | | Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2015-06-261-162/+322
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull workqueue updates from Tejun Heo: "Most of the changes are around implementing and fixing fallouts from sysfs and internal interface to limit the CPUs available to all unbound workqueues to help isolating CPUs. It needs more work as ordered workqueues can roam unrestricted but still is a significant improvement" * 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix typos in comments workqueue: move flush_scheduled_work() to workqueue.h workqueue: remove the lock from wq_sysfs_prep_attrs() workqueue: remove the declaration of copy_workqueue_attrs() workqueue: ensure attrs changes are properly synchronized workqueue: separate out and refactor the locking of applying attrs workqueue: simplify wq_update_unbound_numa() workqueue: wq_pool_mutex protects the attrs-installation workqueue: fix a typo workqueue: function name in the comment differs from the real function name workqueue: fix trivial typo in Documentation/workqueue.txt workqueue: Allow modifying low level unbound workqueue cpumask workqueue: Create low-level unbound workqueues cpumask workqueue: split apply_workqueue_attrs() into 3 stages
| * | | | | | | | | | | workqueue: fix typos in commentsShailendra Verma2015-05-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tj: dropped iff -> if, iff is if and only if not a typo. Spotted by Randy Dunlap. Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org>
| * | | | | | | | | | | workqueue: move flush_scheduled_work() to workqueue.hLai Jiangshan2015-05-211-30/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flush_scheduled_work() is just a simple call to flush_work(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: remove the lock from wq_sysfs_prep_attrs()Lai Jiangshan2015-05-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading to wq->unbound_attrs requires protection of either wq_pool_mutex or wq->mutex, and wq_sysfs_prep_attrs() is called with wq_pool_mutex held, so we don't need to grab wq->mutex here. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: remove the declaration of copy_workqueue_attrs()Lai Jiangshan2015-05-211-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pre-declaration was unneeded since a previous refactor patch 6ba94429c8e7 ("workqueue: Reorder sysfs code"). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: ensure attrs changes are properly synchronizedLai Jiangshan2015-05-191-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current modification to attrs via sysfs is not fully synchronized. Process A (change cpumask) | Process B (change numa affinity) wq_cpumask_store() | wq_sysfs_prep_attrs() | | apply_workqueue_attrs() apply_workqueue_attrs() | It results that the Process B's operation is totally reverted without any notification, it is a buggy behavior. So this patch moves wq_sysfs_prep_attrs() into the protection under wq_pool_mutex to ensure attrs changes are properly synchronized. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: separate out and refactor the locking of applying attrsLai Jiangshan2015-05-191-33/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applying attrs requires two locks: get_online_cpus() and wq_pool_mutex, and this code is duplicated at two places (apply_workqueue_attrs() and workqueue_set_unbound_cpumask()). So we separate out this locking code into apply_wqattrs_[un]lock() and do a minor refactor on apply_workqueue_attrs(). The apply_wqattrs_[un]lock() will be also used on later patch for ensuring attrs changes are properly synchronized. tj: minor updates to comments Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: simplify wq_update_unbound_numa()Lai Jiangshan2015-05-181-15/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wq_update_unbound_numa() is known be called with wq_pool_mutex held. But wq_update_unbound_numa() requests wq->mutex before reading wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq. But these fields were changed to be allowed being read with wq_pool_mutex held. So we simply remove the mutex_lock(&wq->mutex). Without the dependence on the the mutex_lock(&wq->mutex), the test of wq->unbound_attrs->no_numa can also be moved upward. The old code need a long comment to describe the stableness of @wq->unbound_attrs which is also guaranteed by wq_pool_mutex now, so we don't need this such comment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: wq_pool_mutex protects the attrs-installationLai Jiangshan2015-05-181-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current wq_pool_mutex doesn't proctect the attrs-installation, it results that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa() has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa and the old_pwq. attrs-installation is a short operation, so this change will no cause any latency for other operations which also acquire the wq_pool_mutex. The only unprotected attrs-installation code is in apply_workqueue_attrs(), so this patch touches code less than comments. It is also a preparation patch for next several patches which read wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with only wq_pool_mutex held. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: fix a typoChen Hanxiao2015-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/detemined/determined Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: function name in the comment differs from the real function nameGong Zhaogang2015-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modify wq_calc_node_mask to wq_calc_node_cpumask Signed-off-by: Gong Zhaogang <gongzhaogang@inspur.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: Allow modifying low level unbound workqueue cpumaskLai Jiangshan2015-04-301-9/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to modify the low-level unbound workqueues cpumask through sysfs. This is performed by traversing the entire workqueue list and calling apply_wqattrs_prepare() on the unbound workqueues with the new low level mask. Only after all the preparation are done, we commit them all together. Ordered workqueues are ignored from the low level unbound workqueue cpumask, it will be handled in near future. All the (default & per-node) pwqs are mandatorily controlled by the low level cpumask. If the user configured cpumask doesn't overlap with the low level cpumask, the low level cpumask will be used for the wq instead. The comment of wq_calc_node_cpumask() is updated and explicitly requires that its first argument should be the attrs of the default pwq. The default wq_unbound_cpumask is cpu_possible_mask. The workqueue subsystem doesn't know its best default value, let the system manager or the other subsystem set it when needed. Changed from V8: merge the calculating code for the attrs of the default pwq together. minor change the code&comments for saving the user configured attrs. remove unnecessary list_del(). minor update the comment of wq_calc_node_cpumask(). update the comment of workqueue_set_unbound_cpumask(); Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: Create low-level unbound workqueues cpumaskFrederic Weisbecker2015-04-271-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a cpumask that limits the affinity of all unbound workqueues. This cpumask is controlled through a file at the root of the workqueue sysfs directory. It works on a lower-level than the per WQ_SYSFS workqueues cpumask files such that the effective cpumask applied for a given unbound workqueue is the intersection of /sys/devices/virtual/workqueue/$WORKQUEUE/cpumask and the new /sys/devices/virtual/workqueue/cpumask file. This patch implements the basic infrastructure and the read interface. wq_unbound_cpumask is initially set to cpu_possible_mask. Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | workqueue: split apply_workqueue_attrs() into 3 stagesLai Jiangshan2015-04-271-84/+115
| |/ / / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation, so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't ensure the transaction must succeed or fail as a complete unit. To solve this, we split apply_workqueue_attrs() into three stages. The first stage does the preparation: allocation memory, pwqs. The second stage does the attrs-installaion and pwqs-installation. The third stage frees the allocated memory and (old or unused) pwqs. As the result, batching multiple apply_workqueue_attrs()s can succeed or fail as a complete unit: 1) batch do all the first stage for all the workqueues 2) only commit all when all the above succeed. This patch is a preparation for the next patch ("Allow modifying low level unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs(). The patch doesn't have functionality changed except two minor adjustment: 1) free_unbound_pwq() for the error path is removed, we use the heavier version put_pwq_unlocked() instead since the error path is rare. this adjustment simplifies the code. 2) the memory-allocation is also moved into wq_pool_mutex. this is needed to avoid to do the further splitting. tj: minor updates to comments. Suggested-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Mike Galbraith <bitbucket@online.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | | | | | | | | Merge branch 'for-4.2' of ↵Linus Torvalds2015-06-262-131/+146
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - threadgroup_lock got reorganized so that its users can pick the actual locking mechanism to use. Its only user - cgroups - is updated to use a percpu_rwsem instead of per-process rwsem. This makes things a bit lighter on hot paths and allows cgroups to perform and fail multi-task (a process) migrations atomically. Multi-task migrations are used in several places including the unified hierarchy. - Delegation rule and documentation added to unified hierarchy. This will likely be the last interface update from the cgroup core side for unified hierarchy before lifting the devel mask. - Some groundwork for the pids controller which is scheduled to be merged in the coming devel cycle. * 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: add delegation section to unified hierarchy documentation cgroup: require write perm on common ancestor when moving processes on the default hierarchy cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write() kernfs: make kernfs_get_inode() public MAINTAINERS: add a cgroup core co-maintainer cgroup: fix uninitialised iterator in for_each_subsys_which cgroup: replace explicit ss_mask checking with for_each_subsys_which cgroup: use bitmask to filter for_each_subsys cgroup: add seq_file forward declaration for struct cftype cgroup: simplify threadgroup locking sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem sched, cgroup: reorganize threadgroup locking cgroup: switch to unsigned long for bitmasks cgroup: reorganize include/linux/cgroup.h cgroup: separate out include/linux/cgroup-defs.h cgroup: fix some comment typos
| * | | | | | | | | | | cgroup: require write perm on common ancestor when moving processes on the ↵Tejun Heo2015-06-181-3/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | default hierarchy On traditional hierarchies, if a task has write access to "tasks" or "cgroup.procs" file of a cgroup and its euid agrees with the target, it can move the target to the cgroup; however, consider the following scenario. The owner of each cgroup is in the parentheses. R (root) - 0 (root) - 00 (user1) - 000 (user1) | \ 001 (user1) \ 1 (root) - 10 (user1) The subtrees of 00 and 10 are delegated to user1; however, while both subtrees may belong to the same user, it is clear that the two subtrees are to be isolated - they're under completely separate resource limits imposed by 0 and 1, respectively. Note that 0 and 1 aren't strictly necessary but added to ease illustrating the issue. If user1 is allowed to move processes between the two subtrees, the intention of the hierarchy - keeping a given group of processes under a subtree with certain resource restrictions while delegating management of the subtree - can be circumvented by user1. This happens because migration permission check doesn't consider the hierarchical nature of cgroups. To fix the issue, this patch adds an extra permission requirement when userland tries to migrate a process in the default hierarchy - the issuing task must have write access to the common ancestor of "cgroup.procs" file of the ancestor in addition to the destination's. Conceptually, the issuer must be able to move the target process from the source cgroup to the common ancestor of source and destination cgroups and then to the destination. As long as delegation is done in a proper top-down way, this guarantees that a delegatee can't smuggle processes across disjoint delegation domains. The next patch will add documentation on the delegation model on the default hierarchy. v2: Fixed missing !ret test. Spotted by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com>
| * | | | | | | | | | | cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()Tejun Heo2015-06-181-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate out task / process migration permission check from __cgroup_procs_write() into cgroup_procs_write_permission(). * Permission check is moved right above the actual migration and no longer performed while holding rcu_read_lock(). cgroup_procs_write_permission() uses get_task_cred() / put_cred() instead of __task_cred(). Also, !root trying to migrate kthreadd or PF_NO_SETAFFINITY tasks will now fail with -EINVAL rather than -EACCES which should be fine. * The same permission check is now performed even when moving self by specifying 0 as pid. This always succeeds so there's no functional difference. We'll add more permission checks later and the benefits of keeping both cases consistent outweigh the minute overhead of doing perm checks on pid 0 case. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | cgroup: fix uninitialised iterator in for_each_subsys_whichAleksa Sarai2015-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the fact that @ssid is uninitialised in the case where CGROUP_SUBSYS_COUNT = 0 by setting ssid to 0. Fixes: cb4a31675270 ("cgroup: use bitmask to filter for_each_subsys") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | cgroup: replace explicit ss_mask checking with for_each_subsys_whichAleksa Sarai2015-06-081-28/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the explicit checking against ss_masks inside a for_each_subsys block with for_each_subsys_which(..., ss_mask), to take advantage of the more readable (and more efficient) macro. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
| * | | | | | | | | | | cgroup: use bitmask to filter for_each_subsysAleksa Sarai2015-06-081-20/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new macro for_each_subsys_which that allows all enabled cgroup subsystems to be filtered by a bitmask, such that mask & (1 << ssid) determines if the subsystem is to be processed in the loop body (where ssid is the unique id of the subsystem). Also replace the need_forkexit_callback with two separate bitmasks for each callback to make (ss->{fork,exit}) checks unnecessary. tj: add a short comment for "if (!CGROUP_SUBSYS_COUNT)". Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
| * | | | | | | | | | | cgroup: simplify threadgroup lockingTejun Heo2015-05-261-35/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that threadgroup locking is made global, code paths around it can be simplified. * lock-verify-unlock-retry dancing removed from __cgroup_procs_write(). * Race protection against de_thread() removed from cgroup_update_dfl_csses(). Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsemTejun Heo2015-05-262-61/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cgroup side of threadgroup locking uses signal_struct->group_rwsem to synchronize against threadgroup changes. This per-process rwsem adds small overhead to thread creation, exit and exec paths, forces cgroup code paths to do lock-verify-unlock-retry dance in a couple places and makes it impossible to atomically perform operations across multiple processes. This patch replaces signal_struct->group_rwsem with a global percpu_rwsem cgroup_threadgroup_rwsem which is cheaper on the reader side and contained in cgroups proper. This patch converts one-to-one. This does make writer side heavier and lower the granularity; however, cgroup process migration is a fairly cold path, we do want to optimize thread operations over it and cgroup migration operations don't take enough time for the lower granularity to matter. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
| * | | | | | | | | | | sched, cgroup: reorganize threadgroup lockingTejun Heo2015-05-261-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threadgroup_change_begin/end() are used to mark the beginning and end of threadgroup modifying operations to allow code paths which require a threadgroup to stay stable across blocking operations to synchronize against those sections using threadgroup_lock/unlock(). It's currently implemented as a general mechanism in sched.h using per-signal_struct rwsem; however, this never grew non-cgroup use cases and becomes noop if !CONFIG_CGROUPS. It turns out that cgroups is gonna be better served with a different sycnrhonization scheme and is a bit silly to keep cgroups specific details as a general mechanism. What's general here is identifying the places where threadgroups are modified. This patch restructures threadgroup locking so that threadgroup_change_begin/end() become a place where subsystems which need to sycnhronize against threadgroup changes can hook into. cgroup_threadgroup_change_begin/end() which operate on the per-signal_struct rwsem are created and threadgroup_lock/unlock() are moved to cgroup.c and made static. This is pure reorganization which doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
| * | | | | | | | | | | cgroup: switch to unsigned long for bitmasksAleksa Sarai2015-05-181-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch the type of all internal cgroup masks to (unsigned long), which is the correct type for bitmasks. This is in preparation for the for_each_subsys_which patch. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | | | | | | | | | cgroup: fix some comment typosChen Hanxiao2015-04-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/effctive/effective s/hierarhcy/hierarchy s/shoulid/should Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | | | | | | | | | Merge tag 'driver-core-4.2-rc1' of ↵Linus Torvalds2015-06-262-8/+21
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the driver core / firmware changes for 4.2-rc1. A number of small changes all over the place in the driver core, and in the firmware subsystem. Nothing really major, full details in the shortlog. Some of it is a bit of churn, given that the platform driver probing changes was found to not work well, so they were reverted. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits) Revert "base/platform: Only insert MEM and IO resources" Revert "base/platform: Continue on insert_resource() error" Revert "of/platform: Use platform_device interface" Revert "base/platform: Remove code duplication" firmware: add missing kfree for work on async call fs: sysfs: don't pass count == 0 to bin file readers base:dd - Fix for typo in comment to function driver_deferred_probe_trigger(). base/platform: Remove code duplication of/platform: Use platform_device interface base/platform: Continue on insert_resource() error base/platform: Only insert MEM and IO resources firmware: use const for remaining firmware names firmware: fix possible use after free on name on asynchronous request firmware: check for file truncation on direct firmware loading firmware: fix __getname() missing failure check drivers: of/base: move of_init to driver_init drivers/base: cacheinfo: fix annoying typo when DT nodes are absent sysfs: disambiguate between "error code" and "failure" in comments driver-core: fix build for !CONFIG_MODULES driver-core: make __device_attach() static ...
| * \ \ \ \ \ \ \ \ \ \ \ Merge 4.1-rc7 into driver-core-nextGreg Kroah-Hartman2015-06-087-14/+33
| |\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want the fixes in this branch as well for testing and merge resolution. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | | | | | | | | | | driver-core: add driver module asynchronous probe supportLuis R. Rodriguez2015-05-201-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some init systems may wish to express the desire to have device drivers run their probe() code asynchronously. This implements support for this and allows userspace to request async probe as a preference through a generic shared device driver module parameter, async_probe. Implementation for async probe is supported through a module parameter given that since synchronous probe has been prevalent for years some userspace might exist which relies on the fact that the device driver will probe synchronously and the assumption that devices it provides will be immediately available after this. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
OpenPOWER on IntegriCloud