op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	tracing/probes: Integrate duplicate set_print_fmt()	Namhyung Kim	2014-01-02	4	-116/+66
\| \| \| \| \| \| \| \| \| \| \| \|	The set_print_fmt() functions are implemented almost same for [ku]probes. Move it to a common place and get rid of the duplication. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing/kprobes: Move common functions to trace_probe.h	Namhyung Kim	2014-01-02	2	-48/+48
\| \| \| \| \| \| \| \| \| \| \| \|	The __get_data_size() and store_trace_args() will be used by uprobes too. Move them to a common location. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing/uprobes: Convert to struct trace_probe	Namhyung Kim	2014-01-02	1	-80/+79
\| \| \| \| \| \| \| \| \| \| \| \|	Convert struct trace_uprobe to make use of the common trace_probe structure. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing/kprobes: Factor out struct trace_probe	Namhyung Kim	2014-01-02	2	-285/+295
\| \| \| \| \| \| \| \| \| \| \| \| \|	There are functions that can be shared to both of kprobes and uprobes. Separate common data structure to struct trace_probe and use it from the shared functions. Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing/probes: Fix basic print type functions	Namhyung Kim	2014-01-02	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The print format of s32 type was "ld" and it's casted to "long". So it turned out to print 4294967295 for "-1" on 64-bit systems. Not sure whether it worked well on 32-bit systems. Anyway, it doesn't need to have cast argument at all since it already casted using type pointer - just get rid of it. Thanks to Oleg for pointing that out. And print 0x prefix for unsigned type as it shows hex numbers. Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing/uprobes: Fix documentation of uprobe registration syntax	Namhyung Kim	2014-01-02	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	The uprobe syntax requires an offset after a file path not a symbol. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
*	tracing: Fix rcu handling of event_trigger_data filter field	Steven Rostedt (Red Hat)	2014-01-02	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \|	The filter field of the event_trigger_data structure is protected under RCU sched locks. It was not annotated as such, and after doing so, sparse pointed out several locations that required fix ups. Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add generic tracing_lseek() function	Steven Rostedt (Red Hat)	2014-01-02	6	-28/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trace event triggers added a lseek that uses the ftrace_filter_lseek() function. Unfortunately, when function tracing is not configured in that function is not defined and the kernel fails to build. This is the second time that function was added to a file ops and it broke the build due to requiring special config dependencies. Make a generic tracing_lseek() that all the tracing utilities may use. Also, modify the old ftrace_filter_lseek() to return 0 instead of 1 on WRONLY. Not sure why it was a 1 as that does not make sense. This also changes the old tracing_seek() to modify the file pos pointer on WRONLY as well. Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add documentation for trace event triggers	Tom Zanussi	2013-12-21	1	-0/+207
\| \| \| \| \| \| \| \| \| \|	Provide a basic overview of trace event triggers and document the available trigger commands, along with a few simple examples. Link: http://lkml.kernel.org/r/2595dd9196d7b553049611f2a3f849ca75d650a2.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add and use generic set_trigger_filter() implementation	Tom Zanussi	2013-12-21	6	-27/+263
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a generic event_command.set_trigger_filter() op implementation and have the current set of trigger commands use it - this essentially gives them all support for filters. Syntactically, filters are supported by adding 'if <filter>' just after the command, in which case only events matching the filter will invoke the trigger. For example, to add a filter to an enable/disable_event command: echo 'enable_event:system:event if common_pid == 999' > \ .../othersys/otherevent/trigger The above command will only enable the system:event event if the common_pid field in the othersys:otherevent event is 999. As another example, to add a filter to a stacktrace command: echo 'stacktrace if common_pid == 999' > \ .../somesys/someevent/trigger The above command will only trigger a stacktrace if the common_pid field in the event is 999. The filter syntax is the same as that described in the 'Event filtering' section of Documentation/trace/events.txt. Because triggers can now use filters, the trigger-invoking logic needs to be moved in those cases - e.g. for ftrace_raw_event_calls, if a trigger has a filter associated with it, the trigger invocation now needs to happen after the { assign; } part of the call, in order for the trigger condition to be tested. There's still a SOFT_DISABLED-only check at the top of e.g. the ftrace_raw_events function, so when an event is soft disabled but not because of the presence of a trigger, the original SOFT_DISABLED behavior remains unchanged. There's also a bit of trickiness in that some triggers need to avoid being invoked while an event is currently in the process of being logged, since the trigger may itself log data into the trace buffer. Thus we make sure the current event is committed before invoking those triggers. To do that, we split the trigger invocation in two - the first part (event_triggers_call()) checks the filter using the current trace record; if a command has the post_trigger flag set, it sets a bit for itself in the return value, otherwise it directly invoks the trigger. Once all commands have been either invoked or set their return flag, event_triggers_call() returns. The current record is then either committed or discarded; if any commands have deferred their triggers, those commands are finally invoked following the close of the current event by event_triggers_post_call(). To simplify the above and make it more efficient, the TRIGGER_COND bit is introduced, which is set only if a soft-disabled trigger needs to use the log record for filter testing or needs to wait until the current log record is closed. The syscall event invocation code is also changed in analogous ways. Because event triggers need to be able to create and free filters, this also adds a couple external wrappers for the existing create_filter and free_filter functions, which are too generic to be made extern functions themselves. Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdef	Steven Rostedt (Red Hat)	2013-12-21	1	-13/+13
\| \| \| \| \| \| \| \|	Now that event triggers use ftrace_event_file(), it needs to be outside the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is not defined. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add 'enable_event' and 'disable_event' event trigger commands	Tom Zanussi	2013-12-21	4	-1/+365
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'enable_event' and 'disable_event' event_command commands. enable_event and disable_event event triggers are added by the user via these commands in a similar way and using practically the same syntax as the analagous 'enable_event' and 'disable_event' ftrace function commands, but instead of writing to the set_ftrace_filter file, the enable_event and disable_event triggers are written to the per-event 'trigger' files: echo 'enable_event:system:event' > .../othersys/otherevent/trigger echo 'disable_event:system:event' > .../othersys/otherevent/trigger The above commands will enable or disable the 'system:event' trace events whenever the othersys:otherevent events are hit. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger Where N is the number of times the command will be invoked. The above commands will will enable or disable the 'system:event' trace events whenever the othersys:otherevent events are hit, but only N times. This also makes the find_event_file() helper function extern, since it's useful to use from other places, such as the event triggers code, so make it accessible. Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add 'stacktrace' event trigger command	Tom Zanussi	2013-12-21	2	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'stacktrace' event_command. stacktrace event triggers are added by the user via this command in a similar way and using practically the same syntax as the analogous 'stacktrace' ftrace function command, but instead of writing to the set_ftrace_filter file, the stacktrace event trigger is written to the per-event 'trigger' files: echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger The above command will turn on stacktraces for someevent i.e. whenever someevent is hit, a stacktrace will be logged. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger Where N is the number of times the command will be invoked. The above command will log N stacktraces for someevent i.e. whenever someevent is hit N times, a stacktrace will be logged. Link: http://lkml.kernel.org/r/0c30c008a0828c660aa0e1bbd3255cf179ed5c30.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add 'snapshot' event trigger command	Tom Zanussi	2013-12-21	4	-3/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'snapshot' event_command. snapshot event triggers are added by the user via this command in a similar way and using practically the same syntax as the analogous 'snapshot' ftrace function command, but instead of writing to the set_ftrace_filter file, the snapshot event trigger is written to the per-event 'trigger' files: echo 'snapshot' > .../somesys/someevent/trigger The above command will turn on snapshots for someevent i.e. whenever someevent is hit, a snapshot will be done. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'snapshot:N' > .../somesys/someevent/trigger Where N is the number of times the command will be invoked. The above command will snapshot N times for someevent i.e. whenever someevent is hit N times, a snapshot will be done. Also adds a new tracing_alloc_snapshot() function - the existing tracing_snapshot_alloc() function is a special version of tracing_snapshot() that also does the snapshot allocation - the snapshot triggers would like to be able to do just the allocation but not take a snapshot; the existing tracing_snapshot_alloc() in turn now also calls tracing_alloc_snapshot() underneath to do that allocation. Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [ fix up from kbuild test robot <fengguang.wu@intel.com report ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add 'traceon' and 'traceoff' event trigger commands	Tom Zanussi	2013-12-20	2	-0/+447
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'traceon' and 'traceoff' event_command commands. traceon and traceoff event triggers are added by the user via these commands in a similar way and using practically the same syntax as the analagous 'traceon' and 'traceoff' ftrace function commands, but instead of writing to the set_ftrace_filter file, the traceon and traceoff triggers are written to the per-event 'trigger' files: echo 'traceon' > .../tracing/events/somesys/someevent/trigger echo 'traceoff' > .../tracing/events/somesys/someevent/trigger The above command will turn tracing on or off whenever someevent is hit. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger Where N is the number of times the command will be invoked. The above commands will will turn tracing on or off whenever someevent is hit, but only N times. Some common register/unregister_trigger() implementations of the event_command reg()/unreg() callbacks are also provided, which add and remove trigger instances to the per-event list of triggers, and arm/disarm them as appropriate. event_trigger_callback() is a general-purpose event_command func() implementation that orchestrates command parsing and registration for most normal commands. Most event commands will use these, but some will override and possibly reuse them. The event_trigger_init(), event_trigger_free(), and event_trigger_print() functions are meant to be common implementations of the event_trigger_ops init(), free(), and print() ops, respectively. Most trigger_ops implementations will use these, but some will override and possibly reuse them. Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	tracing: Add basic event trigger framework	Tom Zanussi	2013-12-20	7	-5/+495
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a 'trigger' file for each trace event, enabling 'trace event triggers' to be set for trace events. 'trace event triggers' are patterned after the existing 'ftrace function triggers' implementation except that triggers are written to per-event 'trigger' files instead of to a single file such as the 'set_ftrace_filter' used for ftrace function triggers. The implementation is meant to be entirely separate from ftrace function triggers, in order to keep the respective implementations relatively simple and to allow them to diverge. The event trigger functionality is built on top of SOFT_DISABLE functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file flags which is checked when any trace event fires. Triggers set for a particular event need to be checked regardless of whether that event is actually enabled or not - getting an event to fire even if it's not enabled is what's already implemented by SOFT_DISABLE mode, so trigger mode directly reuses that. Event trigger essentially inherit the soft disable logic in __ftrace_event_enable_disable() while adding a bit of logic and trigger reference counting via tm_ref on top of that in a new trace_event_trigger_enable_disable() function. Because the base __ftrace_event_enable_disable() code now needs to be invoked from outside trace_events.c, a wrapper is also added for those usages. The triggers for an event are actually invoked via a new function, event_triggers_call(), and code is also added to invoke them for ftrace_raw_event calls as well as syscall events. The main part of the patch creates a new trace_events_trigger.c file to contain the trace event triggers implementation. The standard open, read, and release file operations are implemented here. The open() implementation sets up for the various open modes of the 'trigger' file. It creates and attaches the trigger iterator and sets up the command parser. If opened for reading set up the trigger seq_ops. The read() implementation parses the event trigger written to the 'trigger' file, looks up the trigger command, and passes it along to that event_command's func() implementation for command-specific processing. The release() implementation does whatever cleanup is needed to release the 'trigger' file, like releasing the parser and trigger iterator, etc. A couple of functions for event command registration and unregistration are added, along with a list to add them to and a mutex to protect them, as well as an (initially empty) registration function to add the set of commands that will be added by future commits, and call to it from the trace event initialization code. also added are a couple trigger-specific data structures needed for these implementations such as a trigger iterator and a struct for trigger-specific data. A couple structs consisting mostly of function meant to be implemented in command-specific ways, event_command and event_trigger_ops, are used by the generic event trigger command implementations. They're being put into trace.h alongside the other trace_event data structures and functions, in the expectation that they'll be needed in several trace_event-related files such as trace_events_trigger.c and trace_events.c. The event_command.func() function is meant to be called by the trigger parsing code in order to add a trigger instance to the corresponding event. It essentially coordinates adding a live trigger instance to the event, and arming the triggering the event. Every event_command func() implementation essentially does the same thing for any command: - choose ops - use the value of param to choose either a number or count version of event_trigger_ops specific to the command - do the register or unregister of those ops - associate a filter, if specified, with the triggering event The reg() and unreg() ops allow command-specific implementations for event_trigger_op registration and unregistration, and the get_trigger_ops() op allows command-specific event_trigger_ops selection to be parameterized. When a trigger instance is added, the reg() op essentially adds that trigger to the triggering event and arms it, while unreg() does the opposite. The set_filter() function is used to associate a filter with the trigger - if the command doesn't specify a set_filter() implementation, the command will ignore filters. Each command has an associated trigger_type, which serves double duty, both as a unique identifier for the command as well as a value that can be used for setting a trigger mode bit during trigger invocation. The signature of func() adds a pointer to the event_command struct, used to invoke those functions, along with a command_data param that can be passed to the reg/unreg functions. This allows func() implementations to use command-specific blobs and supports code re-use. The event_trigger_ops.func() command corrsponds to the trigger 'probe' function that gets called when the triggering event is actually invoked. The other functions are used to list the trigger when needed, along with a couple mundane book-keeping functions. This also moves event_file_data() into trace.h so it can be used outside of trace_events.c. Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Idea-by: Steve Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
*	Linux 3.13-rc4v3.13-rc4	Linus Torvalds	2013-12-15	1	-1/+1
\|
*	null_blk: mem garbage on NUMA systems during init	Matias Bjorling	2013-12-15	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	radeon_pm: fix oops in hwmon_attributes_visible() and ↵	Sergey Senozhatsky	2013-12-15	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	radeon_hwmon_show_temp_thresh() Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups") radeon_hwmon_init() is using hwmon_device_register_with_groups(), which sets `rdev' as a device private driver_data, while hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'. Fix them by using dev_get_drvdata(), in order to avoid this oops: BUG: unable to handle kernel paging request at 0000000000001e28 IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon] PGD 15057e067 PUD 151a8e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Call Trace: internal_create_group+0x114/0x1d9 sysfs_create_group+0xe/0x10 sysfs_create_groups+0x22/0x5f device_add+0x34f/0x501 device_register+0x15/0x18 hwmon_device_register_with_groups+0xb5/0xed radeon_hwmon_init+0x56/0x7c [radeon] radeon_pm_init+0x134/0x7e5 [radeon] radeon_modeset_init+0x75f/0x8ed [radeon] radeon_driver_load_kms+0xc6/0x187 [radeon] drm_dev_register+0xf9/0x1b4 [drm] drm_get_pci_dev+0x98/0x129 [drm] radeon_pci_probe+0xa3/0xac [radeon] pci_device_probe+0x6e/0xcf driver_probe_device+0x98/0x1c4 __driver_attach+0x5c/0x7e bus_for_each_dev+0x7b/0x85 driver_attach+0x19/0x1b bus_add_driver+0x104/0x1ce driver_register+0x89/0xc5 __pci_register_driver+0x58/0x5b drm_pci_init+0x86/0xea [drm] radeon_init+0x97/0x1000 [radeon] do_one_initcall+0x7f/0x117 load_module+0x1583/0x1bb4 SyS_init_module+0xa0/0xaf Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	2013-12-15	141	-916/+1551
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull networking fixes from David Miller: 1) Revert CHECKSUM_COMPLETE optimization in pskb_trim_rcsum(), I can't figure out why it breaks things. 2) Fix comparison in netfilter ipset's hash_netnet4_data_equal(), it was basically doing "x == x", from Dave Jones. 3) Freescale FEC driver was DMA mapping the wrong number of bytes, from Sebastian Siewior. 4) Blackhole and prohibit routes in ipv6 were not doing the right thing because their ->input and ->output methods were not being assigned correctly. Now they behave properly like their ipv4 counterparts. From Kamala R. 5) Several drivers advertise the NETIF_F_FRAGLIST capability, but really do not support this feature and will send garbage packets if fed fraglist SKBs. From Eric Dumazet. 6) Fix long standing user triggerable BUG_ON over loopback in RDS protocol stack, from Venkat Venkatsubra. 7) Several not so common code paths can potentially try to invoke packet scheduler actions that might be NULL without checking. Shore things up by either 1) defining a method as mandatory and erroring on registration if that method is NULL 2) defininig a method as optional and the registration function hooks up a default implementation when NULL is seen. From Jamal Hadi Salim. 8) Fix fragment detection in xen-natback driver, from Paul Durrant. 9) Kill dangling enter_memory_pressure method in cg_proto ops, from Eric W Biederman. 10) SKBs that traverse namespaces should have their local_df cleared, from Hannes Frederic Sowa. 11) IOCB file position is not being updated by macvtap_aio_read() and tun_chr_aio_read(). From Zhi Yong Wu. 12) Don't free virtio_net netdev before releasing all of the NAPI instances. From Andrey Vagin. 13) Procfs entry leak in xt_hashlimit, from Sergey Popovich. 14) IPv6 routes that are no cached routes should not count against the garbage collection limits. We had this almost right, but were missing handling addrconf generated routes properly. From Hannes Frederic Sowa. 15) fib{4,6}_rule_suppress() have to consider potentially seeing NULL route info when they are called, from Stefan Tomanek. 16) TUN and MACVTAP have had truncated packet signalling for some time, fix from Jason Wang. 17) Fix use after frrr in __udp4_lib_rcv(), from Eric Dumazet. 18) xen-netback does not interpret the NAPI budget properly for TX work, fix from Paul Durrant. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits) igb: Fix for issue where values could be too high for udelay function. i40e: fix null dereference xen-netback: fix gso_prefix check net: make neigh_priv_len in struct net_device 16bit instead of 8bit drivers: net: cpsw: fix for cpsw crash when build as modules xen-netback: napi: don't prematurely request a tx event xen-netback: napi: fix abuse of budget sch_tbf: use do_div() for 64-bit divide udp: ipv4: must add synchronization in udp_sk_rx_dst_set() net:fec: remove duplicate lines in comment about errata ERR006358 Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature" 8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature xen-netback: make sure skb linear area covers checksum field net: smc91x: Fix device tree based configuration so it's usable udp: ipv4: fix potential use after free in udp_v4_early_demux() macvtap: signal truncated packets tun: unbreak truncated packet signalling net: sched: htb: fix the calculation of quantum net: sched: tbf: fix the calculation of max_size micrel: add support for KSZ8041RNLI ...
\| *	igb: Fix for issue where values could be too high for udelay function.	Carolyn Wyborny	2013-12-14	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes the igb_phy_has_link function to check the value of the parameter before deciding to use udelay or mdelay in order to be sure that the value is not too high for udelay function. CC: stable kernel <stable@vger.kernel.org> # 3.9+ Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	i40e: fix null dereference	Jesse Brandeburg	2013-12-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the vsi->tx_rings structure is NULL we don't want to panic. Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xen-netback: fix gso_prefix check	Paul Durrant	2013-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a mistake in checking the gso_prefix mask when passing large packets to a guest. The wrong shift is applied to the bit - the raw skb gso type is used rather then the translated one. This leads to large packets being handed to the guest without the GSO metadata. This patch fixes the check. The mistake manifested as errors whilst running Microsoft HCK large packet offload tests between a pair of Windows 8 VMs. I have verified this patch fixes those errors. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: make neigh_priv_len in struct net_device 16bit instead of 8bit	Sebastian Siewior	2013-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	neigh_priv_len is defined as u8. With all debug enabled struct ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96 bytes and here the spinlock with 72 bytes. The size value still fits in this u8 leaving some room for more. On -RT struct ipoib_neigh put on weight and has 392 bytes. The main reason is sk_buff_head with 288 and the fatty here is spinlock with 192 bytes. This does no longer fit into into neigh_priv_len and gcc complains. This patch changes neigh_priv_len from being 8bit to 16bit. Since the following element (dev_id) is 16bit followed by a spinlock which is aligned, the struct remains with a total size of 3200 (allmodconfig) / 2048 (with as much debug off as possible) bytes on x86-64. On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug off as possible) bytes long. The numbers were gained with and without the patch to prove that this change does not increase the size of the struct. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: cpsw: fix for cpsw crash when build as modules	Mugunthan V N	2013-12-12	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CPSW and Davinci MDIO are build as modules, CPSW crashes when accessing CPSW registers in CPSW probe. The same is working in built-in as the CPSW clocks are enabled in Davindi MDIO probe, SO Enabling the clocks before accessing the version register and moving out the other register access to cpsw device open. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xen-netback: napi: don't prematurely request a tx event	Paul Durrant	2013-12-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the former call has the side effect of advancing the ring event pointer and therefore inviting another interrupt from the frontend before the napi poll has actually finished, thereby defeating the point of napi. The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in xenvif_poll, the napi poll function, if the work done is less than the budget i.e. when actually transitioning back to interrupt mode. Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xen-netback: napi: fix abuse of budget	Paul Durrant	2013-12-12	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	netback seems to be somewhat confused about the napi budget parameter. The parameter is supposed to limit the number of skbs processed in each poll, but netback has this confused with grant operations. This patch fixes that, properly limiting the work done in each poll. Note that this limit makes sure we do not process any more data from the shared ring than we intend to pass back from the poll. This is important to prevent tx_queue potentially growing without bound. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sch_tbf: use do_div() for 64-bit divide	Yang Yingliang	2013-12-11	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's doing a 64-bit divide which is not supported on 32-bit architectures in psched_ns_t2l(). The correct way to do this is to use do_div(). It's introduced by commit cc106e441a63 ("net: sched: tbf: fix the calculation of max_size") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp: ipv4: must add synchronization in udp_sk_rx_dst_set()	Eric Dumazet	2013-12-11	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike TCP, UDP input path does not hold the socket lock. Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise multiple cpus could leak a refcount. This patch also takes care of renewing a stale dst entry. (When the sk->sk_rx_dst would not be used by IP early demux) Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net:fec: remove duplicate lines in comment about errata ERR006358	Philippe De Muyter	2013-12-11	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 031916568a1aa2ef1809f86d26f0bcfa215ff5c0 worked around errata ERR006358, but comment contains duplicated lines, impairing the readability. Remove them. Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Revert "8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature"	David S. Miller	2013-12-11	16	-426/+302
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 99023e90fe5c147ea0665bda86764ea44f08a622. Accidently checked this into 'net' instead of 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	8390 : Replace ei_debug with msg_enable/NETIF_MSG_* feature	Matthew Whitehead	2013-12-11	16	-302/+426
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed the shared ei_debug variable. Replaced it by adding u32 msg_enable to the private struct ei_device. Now each 8390 ethernet instance has a per-device logging variable. Changed older style printk() calls to more canonical forms. Tested on: ne, ne2k-pci, smc-ultra, and wd hardware. V4.0 - Substituted pr_info() and pr_debug() for printk() KERN_INFO and KERN_DEBUG V3.0 - Checked for cases where pr_cont() was most appropriate choice. - Changed module parameter from 'debug' to 'msg_enable' because debug was no longer the best description. V2.0 - Changed netif_msg_(drv\|probe\|ifdown\|rx_err\|tx_err\|tx_queued\|intr\|rx_status\|hw) to netif_(dbg\|info\|warn\|err) where possible. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	xen-netback: make sure skb linear area covers checksum field	Paul Durrant	2013-12-11	1	-32/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	skb_partial_csum_set requires that the linear area of the skb covers the checksum field. The checksum setup code in netback was only doing that pullup in the case when the pseudo header checksum was being recalculated though. This patch makes that pullup unconditional. (I pullup the whole transport header just for simplicity; the requirement is only for the check field but in the case of UDP this is the last field in the header and in the case of TCP it's the last but one). The lack of pullup manifested as failures running Microsoft HCK network tests on a pair of Windows 8 VMs and it has been verified that this patch fixes the problem. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: smc91x: Fix device tree based configuration so it's usable	Tony Lindgren	2013-12-11	2	-10/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 89ce376c6bdc (drivers/net: Use of_match_ptr() macro in smc91x.c) added minimal device tree support to smc91x, but it's not working on many platforms because of the lack of some key configuration bits. Fix the issue by parsing the necessary configuration like the smc911x driver is doing. As most smc91x users seem to use 16-bit access, let's default to that if no reg-io-width is specified. Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: netdev@vger.kernel.org Cc: devicetree@vger.kernel.org Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp: ipv4: fix potential use after free in udp_v4_early_demux()	Eric Dumazet	2013-12-11	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pskb_may_pull() can reallocate skb->head, we need to move the initialization of iph and uh pointers after its call. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	macvtap: signal truncated packets	Jason Wang	2013-12-11	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	macvtap_put_user() never return a value grater than iov length, this in fact bypasses the truncated checking in macvtap_recvmsg(). Fix this by always returning the size of packet plus the possible vlan header to let the trunca checking work. Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tun: unbreak truncated packet signalling	Jason Wang	2013-12-11	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6680ec68eff47d36f67b4351bc9836fd6cba9532 (tuntap: hardware vlan tx support) breaks the truncated packet signal by nev return a length greater than iov length in tun_put_user(). This patch fixes by always return the length of packet plus possible vlan header. Caller can detect the truncated packet by comparing the return value and the size of io length. Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sched: htb: fix the calculation of quantum	Yang Yingliang	2013-12-11	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, 32bit rates may be not the true rate. So use rate_bytes_ps which is from max(rate32, rate64) to calcualte quantum. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: sched: tbf: fix the calculation of max_size	Yang Yingliang	2013-12-11	1	-45/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current max_size is caluated from rate table. Now, the rate table has been replaced and it's wrong to caculate max_size based on this rate table. It can lead wrong calculation of max_size. The burst in kernel may be lower than user asked, because burst may gets some loss when transform it to buffer(E.g. "burst 40kb rate 30mbit/s") and it seems we cannot avoid this loss. Burst's value(max_size) based on rate table may be equal user asked. If a packet's length is max_size, this packet will be stalled in tbf_dequeue() because its length is above the burst in kernel so that it cannot get enough tokens. The max_size guards against enqueuing packet sizes above q->buffer "time" in tbf_enqueue(). To make consistent with the calculation of tokens, this patch add a helper psched_ns_t2l() to calculate burst(max_size) directly to fix this problem. After this fix, we can support to using 64bit rates to calculate burst as well. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	micrel: add support for KSZ8041RNLI	Sergei Shtylyov	2013-12-11	2	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Renesas R-Car development boards use KSZ8041RNLI PHY which for some reason has ID of 0x00221537 that is not documented for KSZ8041-family PHYs and does not match the documented ID of 0x0022151x (where 'x' is the revision). We have to add the new #define PHY_ID_* and new ksphy_driver[] entry, almost the same as KSZ8041 one, differing only in the 'phy_id' and 'name' fields. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'for-davem' of ↵	David S. Miller	2013-12-11	1	-0/+4
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Just one patch this time -- a fix from Felix Fietkau to fix the duration calculation for non-aggregated packets in ath9k. This is a small change and it is obviously specific to ath9k. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	Merge branch 'master' of ↵	John W. Linville	2013-12-11	1	-0/+4
\| \| \|\ \| \|/ / \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
\| \| *	ath9k: fix duration calculation for non-aggregated packets	Felix Fietkau	2013-12-09	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When not aggregating packets, fi->framelen should be passed in as length to calculate the duration. Before the tx path rework, ath_tx_fill_desc was called for either one aggregate, or one single frame, with the length of the packet or the aggregate as a parameter. After the rework, ath_tx_sched_aggr can pass a burst of single frames to ath_tx_fill_desc and sets len=0. Fix broken duration calculation by overriding the length in ath_tx_fill_desc before passing it to ath_buf_set_rate. Cc: stable@vger.kernel.org Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| * \|	udp: ipv4: fix an use after free in __udp4_lib_rcv()	Eric Dumazet	2013-12-10	1	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dave Jones reported a use after free in UDP stack : [ 5059.434216] ========================= [ 5059.434314] [ BUG: held lock freed! ] [ 5059.434420] 3.13.0-rc3+ #9 Not tainted [ 5059.434520] ------------------------- [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there! [ 5059.434815] (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.435012] 3 locks held by named/863: [ 5059.435086] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940 [ 5059.435295] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410 [ 5059.435500] #2: (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.435734] stack backtrace: [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365] [ 5059.436052] Hardware name: /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010 [ 5059.436223] 0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0 [ 5059.436390] ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0 [ 5059.436554] ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48 [ 5059.436718] Call Trace: [ 5059.436769] <IRQ> [<ffffffff8153a372>] dump_stack+0x4d/0x66 [ 5059.436904] [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160 [ 5059.437037] [<ffffffff8141c490>] ? __sk_free+0x110/0x230 [ 5059.437147] [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150 [ 5059.437260] [<ffffffff8141c490>] __sk_free+0x110/0x230 [ 5059.437364] [<ffffffff8141c5c9>] sk_free+0x19/0x20 [ 5059.437463] [<ffffffff8141cb25>] sock_edemux+0x25/0x40 [ 5059.437567] [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280 [ 5059.437685] [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.437805] [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240 [ 5059.437925] [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70 [ 5059.438038] [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0 [ 5059.438155] [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00 [ 5059.438269] [<ffffffff8149d7f5>] udp_rcv+0x15/0x20 [ 5059.438367] [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410 [ 5059.438492] [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410 [ 5059.438621] [<ffffffff81468653>] ip_local_deliver+0x43/0x80 [ 5059.438733] [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0 [ 5059.438843] [<ffffffff81468926>] ip_rcv+0x296/0x3f0 [ 5059.438945] [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940 [ 5059.439074] [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940 [ 5059.442231] [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10 [ 5059.442231] [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60 [ 5059.442231] [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0 [ 5059.442231] [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0 [ 5059.442231] [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169] [ 5059.442231] [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0 [ 5059.442231] [<ffffffff810478cd>] __do_softirq+0xed/0x240 [ 5059.442231] [<ffffffff81047e25>] irq_exit+0x125/0x140 [ 5059.442231] [<ffffffff81004241>] do_IRQ+0x51/0xc0 [ 5059.442231] [<ffffffff81542bef>] common_interrupt+0x6f/0x6f We need to keep a reference on the socket, by using skb_steal_sock() at the right place. Note that another patch is needed to fix a race in udp_sk_rx_dst_set(), as we hold no lock protecting the dst. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	Merge branch 'sctp'	David S. Miller	2013-12-10	2	-19/+89
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wang Weidong says: ==================== sctp: check the rto_min and rto_max v6 -> v7: -patch2: fix the whitespace issues which pointed out by Daniel v5 -> v6: split the v5' first patch to patch1 and patch2, and remove the macro in constants.h -patch1: do rto_min/max socket option handling in its own patch, and fix the check of rto_min/max. -patch2: do rto_min/max sysctl handling in its own patch. -patch3: add Suggested-by Daniel. v4 -> v5: - patch1: add marco in constants.h and fix up spacing as suggested by Daniel - patch2: add a patch for fix up do_hmac_alg for according to do_rto_min[max] v3 -> v4: -patch1: fix use init_net directly which suggested by Vlad. v2 -> v3: -patch1: add proc_handler for check rto_min and rto_max which suggested by Vlad v1 -> v2: -patch1: fix the From Name which pointed out by David, and add the ACK by Neil ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	sctp: fix up a spacing	wangweidong	2013-12-10	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fix up spacing of proc_sctp_do_hmac_alg for according to the proc_sctp_do_rto_min[max] in sysctl.c Suggested-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	sctp: add check rto_min and rto_max in sysctl	wangweidong	2013-12-10	1	-4/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rto_min should be smaller than rto_max while rto_max should be larger than rto_min. Add two proc_handler for the checking. Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	sctp: check the rto_min and rto_max in setsockopt	wangweidong	2013-12-10	1	-10/+22
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we set 0 to rto_min or rto_max, just not change the value. Also we should check the rto_min > rto_max. Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv6: do not erase dst address with flow label destination	Florent Fourcot	2013-12-10	6	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is following b579035ff766c9412e2b92abf5cab794bff102b6 "ipv6: remove old conditions on flow label sharing" Since there is no reason to restrict a label to a destination, we should not erase the destination value of a socket with the value contained in the flow label storage. This patch allows to really have the same flow label to more than one destination. Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	sctp: properly latch and use autoclose value from sock to association	Neil Horman	2013-12-10	5	-17/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, sctp associations latch a sockets autoclose value to an association at association init time, subject to capping constraints from the max_autoclose sysctl value. This leads to an odd situation where an application may set a socket level autoclose timeout, but sliently sctp will limit the autoclose timeout to something less than that. Fix this by modifying the autoclose setsockopt function to check the limit, cap it and warn the user via syslog that the timeout is capped. This will allow getsockopt to return valid autoclose timeout values that reflect what subsequent associations actually use. While were at it, also elimintate the assoc->autoclose variable, it duplicates whats in the timeout array, which leads to multiple sources for the same information, that may differ (as the former isn't subject to any capping). This gives us the timeout information in a canonical place and saves some space in the association structure as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> CC: Wang Weidong <wangweidong1@huawei.com> CC: David Miller <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>