summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* tracing: Using for_each_set_bit() to simplify trace_pid_write()Wei Yongjun2016-07-051-6/+2
| | | | | | | | | Using for_each_set_bit() to simplify the code. Link: http://lkml.kernel.org/r/1467645004-11169-1-git-send-email-weiyj_lk@163.com Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ftrace: Move toplevel init out of ftrace_init_tracefs()Steven Rostedt (Red Hat)2016-07-053-6/+14
| | | | | | | | | | | | | | | | | | | | | Commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") placed ftrace_init_tracefs into the instance creation, and encapsulated the top level updating with an if conditional, as the top level only gets updated at boot up. Unfortunately, this triggers section mismatch errors as the init functions are called from a function that can be called later, and the section mismatch logic is unaware of the if conditional that would prevent it from happening at run time. To make everyone happy, create a separate ftrace_init_tracefs_toplevel() routine that only gets called by init functions, and this will be what calls other init functions for the toplevel directory. Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing/function_graph: Fix filters for function_graph thresholdJoel Fernandes2016-06-271-9/+8
| | | | | | | | | | | | | | | | | Function graph tracer currently ignores filters if tracing_thresh is set. For example, even if set_ftrace_pid is set, then its ignored if tracing_thresh set, resulting in all processes being traced. To fix this, we reuse the same entry function as when tracing_thresh is not set and do everything as in the regular case except for writing the function entry to the ring buffer. Link: http://lkml.kernel.org/r/1466228694-2677-1-git-send-email-agnel.joel@gmail.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Skip more functions when doing stack tracing of eventsSteven Rostedt (Red Hat)2016-06-231-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [002] d..2 1982.525169: <stack trace> => save_stack_trace => __ftrace_trace_stack => trace_buffer_unlock_commit_regs => event_trigger_unlock_commit => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The above shows that we are seeing 6 functions before ever making it to the caller of the sched_switch event. # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 2146.335208: <stack trace> => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The stacktrace trigger isn't as bad, because it adds its own skip to the stacktracing, but still has two events extra. One issue is that if the stacktrace passes its own "regs" then there should be no addition to the skip, as the regs will not include the functions being called. This was an issue that was fixed by commit 7717c6be6999 ("tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding the skip number for kprobes made the probes not have any stack at all. But since this is only an issue when regs is being used, a skip should be added if regs is NULL. Now we have: # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [000] d..2 1297.676333: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => rest_init => start_kernel => x86_64_start_reservations => x86_64_start_kernel # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 1370.759745: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary And kprobes are not touched. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Expose CPU physical addresses (resource values) for PCI devicesBjorn Helgaas2016-06-201-7/+3
| | | | | | | | | | | | | | | | | | | Previously, mmio_print_pcidev() put "user" addresses in the trace buffer. On most architectures, these are the same as CPU physical addresses, but on microblaze, mips, powerpc, and sparc, they may be something else, typically a raw BAR value (a bus address as opposed to a CPU address). Always expose the CPU physical address to avoid this arch-dependent behavior. This change should have no user-visible effect because this file currently depends on CONFIG_HAVE_MMIOTRACE_SUPPORT, which is only defined for x86, and pci_resource_to_user() is a no-op on x86. Link: http://lkml.kernel.org/r/20160511190657.5898.4248.stgit@bhelgaas-glaptop2.roam.corp.google.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Show the preempt count of when the event was calledSteven Rostedt (Red Hat)2016-06-201-0/+8
| | | | | | | | | | | | | | | | Because tracepoint callbacks are done with preemption enabled, the trace events are always called with preempt disable due to the rcu_read_lock_sched_notrace() in __DO_TRACE(). This causes the preempt count shown in the recorded trace event to be inaccurate. It is always one more that what the preempt_count was when the tracepoint was called. If CONFIG_PREEMPT is enabled, subtract 1 from the preempt_count before recording it in the trace buffer. Link: http://lkml.kernel.org/r/20160525132537.GA10808@linutronix.de Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Choose static tp_printk buffer by explicit nesting countAndy Lutomirski2016-06-201-59/+24
| | | | | | | | | | | | | | | | Currently, the trace_printk code chooses which static buffer to use based on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the code and make it more robust: simply count the nesting depth and choose a buffer based on the current nesting depth. The new code will only drop an event if we nest more than 4 deep, and the old code was guaranteed to malfunction if that happened. Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: expose current->comm to [ku]probe eventsOmar Sandoval2016-06-203-0/+44
| | | | | | | | | | | | | | | ftrace is very quick to give up on saving the task command line (see `trace_save_cmdline()`). The workaround for events which really care about the command line is to explicitly assign it as part of the entry. However, this doesn't work for kprobe events, as there's no straightforward way to get access to current->comm. Add a kprobe/uprobe event variable $comm which provides exactly that. Link: http://lkml.kernel.org/r/f59b472033b943a370f5f48d0af37698f409108f.1465435894.git.osandov@fb.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ftrace: Have set_ftrace_pid use the bitmap like events doSteven Rostedt (Red Hat)2016-06-205-185/+148
| | | | | | | | | Convert set_ftrace_pid to use the bitmap like set_event_pid does. This allows for instances to use the pid filtering as well, and will allow for function-fork option to set if the children of a traced function should be traced or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Move pid_list write processing into its own functionSteven Rostedt (Red Hat)2016-06-203-102/+124
| | | | | | | | | The addition of PIDs into a pid_list via the write operation of set_event_pid is a bit complex. The same operation will be needed for function tracing pids. Move the code into its own generic function in trace.c, so that we can avoid duplication of this code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Move the pid_list seq_file functions to be globalSteven Rostedt (Red Hat)2016-06-203-31/+77
| | | | | | | | To allow other aspects of ftrace to use the pid_list logic, we need to reuse the seq_file functions. Making the generic part into functions that can be called by other files will help in this regard. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Move filtered_pid helper functions into trace.cSteven Rostedt2016-06-202-51/+78
| | | | | | | | | | | | | | | As the filtered_pid functions are going to be used by function tracer as well as trace_events, move the code into the generic trace.c file. The functions moved are: trace_find_filtered_pid() trace_ignore_this_task() trace_filter_add_remove_task() Kernel Doc text was also added. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Make the pid filtering helper functions globalSteven Rostedt2016-06-202-17/+26
| | | | | | | Make the functions used for pid filtering global for tracing, such that the function tracer can use the pid code as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Handle NULL formats in hold_module_trace_bprintk_format()Steven Rostedt (Red Hat)2016-06-201-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com Reported-by: xingzhen <zhengjun.xing@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)Linus Torvalds2016-06-101-1/+2
|\ | | | | | | | | | | | | | | | | Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <jannh@google.com>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top
| * sched: panic on corrupted stack endJann Horn2016-06-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2016-06-103-16/+28
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two scheduler debugging fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix 'schedstats=enable' cmdline option sched/debug: Fix /proc/sched_debug regression
| * | sched/debug: Fix 'schedstats=enable' cmdline optionJosh Poimboeuf2016-06-081-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'schedstats=enable' option doesn't work, and also produces the following warning during boot: WARNING: CPU: 0 PID: 0 at /home/jpoimboe/git/linux/kernel/jump_label.c:61 static_key_slow_inc+0x8c/0xa0 static_key_slow_inc used before call to jump_label_init Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc1+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 0000000000000086 3ae3475a4bea95d4 ffffffff81e03da8 ffffffff8143fc83 ffffffff81e03df8 0000000000000000 ffffffff81e03de8 ffffffff810b1ffb 0000003d00000096 ffffffff823514d0 ffff88007ff197c8 0000000000000000 Call Trace: [<ffffffff8143fc83>] dump_stack+0x85/0xc2 [<ffffffff810b1ffb>] __warn+0xcb/0xf0 [<ffffffff810b207f>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff811e9c0c>] static_key_slow_inc+0x8c/0xa0 [<ffffffff810e07c6>] static_key_enable+0x16/0x40 [<ffffffff8216d633>] setup_schedstats+0x29/0x94 [<ffffffff82148a05>] unknown_bootoption+0x89/0x191 [<ffffffff810d8617>] parse_args+0x297/0x4b0 [<ffffffff82148d61>] start_kernel+0x1d8/0x4a9 [<ffffffff8214897c>] ? set_init_arg+0x55/0x55 [<ffffffff82148120>] ? early_idt_handler_array+0x120/0x120 [<ffffffff821482db>] x86_64_start_reservations+0x2f/0x31 [<ffffffff82148427>] x86_64_start_kernel+0x14a/0x16d The problem is that it tries to update the 'sched_schedstats' static key before jump labels have been initialized. Changing jump_label_init() to be called earlier before parse_early_param() wouldn't fix it: it would still fail trying to poke_text() because mm isn't yet initialized. Instead, just create a temporary '__sched_schedstats' variable which can be copied to the static key later during sched_init() after jump labels have been initialized. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") Link: http://lkml.kernel.org/r/453775fe3433bed65731a583e228ccea806d18cd.1465322027.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | sched/debug: Fix /proc/sched_debug regressionJosh Poimboeuf2016-06-082-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") ... introduced a bug when CONFIG_SCHEDSTATS is enabled and the runtime tunable is disabled (which is the default). The wait-time, sum-exec, and sum-sleep fields are missing from the /proc/sched_debug file in the runnable_tasks section. Fix it with a new schedstat_val() macro which returns the field value when schedstats is enabled and zero otherwise. The macro works with both SCHEDSTATS and !SCHEDSTATS. I put the macro in stats.h since it might end up being useful in other places. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") Link: http://lkml.kernel.org/r/bcda7c2790cf2ccbe586a28c02dd7b6fe7749a2b.1464994423.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-06-101-4/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of tooling fixes, two PMU driver fixes and a cleanup of redundant code that addresses a security analyzer false positive" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove a redundant check perf/x86/intel/uncore: Remove SBOX support for Broadwell server perf ctf: Convert invalid chars in a string before set value perf record: Fix crash when kptr is restricted perf symbols: Check kptr_restrict for root perf/x86/intel/rapl: Fix pmus free during cleanup
| * | | perf/core: Remove a redundant checkAlexander Shishkin2016-06-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no way to end up in _free_event() with event::pmu being NULL. The latter is initialized in event allocation path and remains set forever. In case of allocation failure, the error path doesn't use _free_event(). Having the check, however, suggests that it is possible to have a event::pmu==NULL situation in _free_event() and confuses the robots. This patch gets rid of the check. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/1465303455-26032-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds2016-06-103-6/+77
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - a file-based futex fix - one more spin_unlock_wait() fix - a ww-mutex deadlock detection improvement/fix - and a raw_read_seqcount_latch() barrier fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Calculate the futex key based on a tail page for file-based futexes locking/qspinlock: Fix spin_unlock_wait() some more locking/ww_mutex: Report recursive ww_mutex locking early locking/seqcount: Re-fix raw_read_seqcount_latch()
| * | | | futex: Calculate the futex key based on a tail page for file-based futexesMel Gorman2016-06-081-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Galbraith reported that the LTP test case futex_wake04 was broken by commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()"). This test case uses futexes backed by hugetlbfs pages and so there is an associated inode with a futex stored on such pages. The problem is that the key is being calculated based on the head page index of the hugetlbfs page and not the tail page. Prior to the optimisation, the page lock was used to stabilise mappings and pin the inode is file-backed which is overkill. If the page was a compound page, the head page was automatically looked up as part of the page lock operation but the tail page index was used to calculate the futex key. After the optimisation, the compound head is looked up early and the page lock is only relied upon to identify truncated pages, special pages or a shmem page moving to swapcache. The head page is looked up because without the page lock, special care has to be taken to pin the inode correctly. However, the tail page is still required to calculate the futex key so this patch records the tail page. On vanilla 4.6, the output of the test case is; futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TFAIL : futex_wake04.c:126: Bug: wait_thread2 did not wake after 30 secs. With the patch applied futex_wake04 0 TINFO : Hugepagesize 2097152 futex_wake04 1 TPASS : Hi hydra, thread2 awake! Fixes: 65d8fc777f6d "futex: Remove requirement for lock_page() in get_futex_key()" Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20160608132522.GM2469@suse.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | locking/qspinlock: Fix spin_unlock_wait() some morePeter Zijlstra2016-06-081-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While this prior commit: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") ... fixes spin_is_locked() and spin_unlock_wait() for the usage in ipc/sem and netfilter, it does not in fact work right for the usage in task_work and futex. So while the 2 locks crossed problem: spin_lock(A) spin_lock(B) if (!spin_is_locked(B)) spin_unlock_wait(A) foo() foo(); ... works with the smp_mb() injected by both spin_is_locked() and spin_unlock_wait(), this is not sufficient for: flag = 1; smp_mb(); spin_lock() spin_unlock_wait() if (!flag) // add to lockless list // iterate lockless list ... because in this scenario, the store from spin_lock() can be delayed past the load of flag, uncrossing the variables and loosing the guarantee. This patch reworks spin_is_locked() and spin_unlock_wait() to work in both cases by exploiting the observation that while the lock byte store can be delayed, the contender must have registered itself visibly in other state contained in the word. It also allows for architectures to override both functions, as PPC and ARM64 have an additional issue for which we currently have no generic solution. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org # v4.2 and later Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | locking/ww_mutex: Report recursive ww_mutex locking earlyChris Wilson2016-06-031-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recursive locking for ww_mutexes was originally conceived as an exception. However, it is heavily used by the DRM atomic modesetting code. Currently, the recursive deadlock is checked after we have queued up for a busy-spin and as we never release the lock, we spin until kicked, whereupon the deadlock is discovered and reported. A simple solution for the now common problem is to move the recursive deadlock discovery to the first action when taking the ww_mutex. Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2016-06-101-2/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ...
| * | | | | bpf, trace: use READ_ONCE for retrieving file ptrDaniel Borkmann2016-06-071-2/+2
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In bpf_perf_event_read() and bpf_perf_event_output(), we must use READ_ONCE() for fetching the struct file pointer, which could get updated concurrently, so we must prevent the compiler from potential refetching. We already do this with tail calls for fetching the related bpf_prog, but not so on stored perf events. Semantics for both are the same with regards to updates. Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge tag 'pm-4.7-rc3' of ↵Linus Torvalds2016-06-101-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Stable-candidate fixes for the intel_pstate driver and the cpuidle core. Specifics: - Fix two intel_pstate initialization issues, one of which was introduced during the 4.4 cycle (Srinivas Pandruvada) - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset (Catalin Marinas)" * tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
| | \ \ \ \
| | \ \ \ \
| *-. \ \ \ \ Merge branches 'pm-cpufreq-fixes' and 'pm-cpuidle'Rafael J. Wysocki2016-06-091-1/+1
| |\ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq-fixes: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() * pm-cpuidle: cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
| | | * | | | cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLECatalin Marinas2016-06-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is enabled. Commit c8cc7d4de7a4 ("sched/idle: Reorganize the idle loop") removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the compiler optimising away __this_cpu_read(cpuidle_devices). However, with CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens and the kernel fails to link since cpuidle_devices is not defined. This patch introduces an accessor function for the current CPU cpuidle device (returning NULL when !CONFIG_CPU_IDLE) and uses it in cpuidle_idle_call(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: 4.5+ <stable@vger.kernel.org> # 4.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | | kernel/relay.c: fix potential memory leakZhouyi Zhou2016-06-091-0/+1
|/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When relay_open_buf() fails in relay_open(), code will goto free_bufs, but chan is nowhere freed. Link: http://lkml.kernel.org/r/1464777927-19675-1-git-send-email-yizhouzhou@ict.ac.cn Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2016-06-031-1/+1
|\ \ \ \ \ \ | |_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - a few simple fixes for fallout from the recent gic-v3 changes - a workaround for a Cavium thunderX erratum - a bugfix for the pic32 irqchip to make external interrupts work proper - a missing return value in the generic IPI management code * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-pic32-evic: Fix bug with external interrupts. irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144 irqchip/gic-v3: Fix quiescence check in gic_enable_redist irqchip/gic-v3: Fix copy+paste mistakes in defines irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask genirq: Fix missing return value in irq_destroy_ipi()
| * | | | | Merge tag 'irqchip-4.7-rc1' of ↵Thomas Gleixner2016-06-0366-1058/+6095
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Merge irqchip updates from Marc Zyngier: - A number of embarassing buglets (GICv3, PIC32) - A more substential errata workaround for Cavium's GICv3 ITS (kept for post-rc1 due to its dependency on NUMA)
| * | | | | | genirq: Fix missing return value in irq_destroy_ipi()Matt Redfearn2016-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7cec18a3906b changed the return type of irq_destroy_ipi to int, but missed adding a value to one return statement. Fix this to silence the resulting compiler warning: kernel/irq/ipi.c In function ‘irq_destroy_ipi’: kernel/irq/ipi.c:128:3: warning: ‘return’ with no value, in function returning non-void [-Wreturn-type] Fixes: 7cec18a3906b "genirq: Add error code reporting to irq_{reserve,destroy}_ipi" Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/1464086550-24734-1-git-send-email-matt.redfearn@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2016-05-312-1/+1
|\ \ \ \ \ \ \ | |_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi. 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized properly. From Ezequiel Garcia. 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT. 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT. 5) Fix deadlock on team->lock when propagating features, from Ivan Vecera. 6) Work around buffer offset hw bug in alx chips, from Feng Tang. 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin Long. 8) Various statistics bug fixes in mlx4 from Eric Dumazet. 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann. 10) All of l2tp was namespace aware, but the ipv6 support code was not doing so. From Shmulik Ladkani. 11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck. 12) Propagate MAC changes properly through VLAN devices, from Mike Manning. 13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) sfc: Track RPS flow IDs per channel instead of per function usbnet: smsc95xx: fix link detection for disabled autonegotiation virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv bnx2x: avoid leaking memory on bnx2x_init_one() failures fou: fix IPv6 Kconfig options openvswitch: update checksum in {push,pop}_mpls sctp: sctp_diag should dump sctp socket type net: fec: update dirty_tx even if no skb vlan: Propagate MAC address to VLANs atm: iphase: off by one in rx_pkt() atm: firestream: add more reserved strings vxlan: Accept user specified MTU value when create new vxlan link net: pktgen: Call destroy_hrtimer_on_stack() timer: Export destroy_hrtimer_on_stack() net: l2tp: Make l2tp_ip6 namespace aware Documentation: ip-sysctl.txt: clarify secure_redirects sfc: use flow dissector helpers for aRFS ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr net: nps_enet: Disable interrupts before napi reschedule net/lapb: tuse %*ph to dump buffers ...
| * | | | | | timer: Export destroy_hrtimer_on_stack()Guenter Roeck2016-05-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hrtimer_init_on_stack() needs a matching call to destroy_hrtimer_on_stack(), so both need to be exported. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | bpf, inode: disallow userns mountsDaniel Borkmann2016-05-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow-up to commit e27f4a942a0e ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag. The original idea was to have a per mountns instance instead of a single global fs instance, but that didn't work out and we had to switch to mount_nodev() model. The intent of that middle ground was that we avoid users who don't play nice to create endless instances of bpf fs which are difficult to control and discover from an admin point of view, but at the same time it would have allowed us to be more flexible with regard to namespaces. Therefore, since we now did the switch to mount_nodev() as a fix where individual instances are created, we also need to remove userns mount flag along with it to avoid running into mentioned situation. I don't expect any breakage at this early point in time with removing the flag and we can revisit this later should the requirement for this come up with future users. This and commit e27f4a942a0e have been split to facilitate tracking should any of them run into the unlikely case of causing a regression. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-05-271-0/+16
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking
| * | | | | | | add down_write_killable_nested()Al Viro2016-05-261-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | remove lots of IS_ERR_VALUE abusesArnd Bergmann2016-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most users of IS_ERR_VALUE() in the kernel are wrong, as they pass an 'int' into a function that takes an 'unsigned long' argument. This happens to work because the type is sign-extended on 64-bit architectures before it gets converted into an unsigned type. However, anything that passes an 'unsigned short' or 'unsigned int' argument into IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers and types that are wider than 'unsigned long'. Andrzej Hajda has already fixed a lot of the worst abusers that were causing actual bugs, but it would be nice to prevent any users that are not passing 'unsigned long' arguments. This patch changes all users of IS_ERR_VALUE() that I could find on 32-bit ARM randconfig builds and x86 allmodconfig. For the moment, this doesn't change the definition of IS_ERR_VALUE() because there are probably still architecture specific users elsewhere. Almost all the warnings I got are for files that are better off using 'if (err)' or 'if (err < 0)'. The only legitimate user I could find that we get a warning for is the (32-bit only) freescale fman driver, so I did not remove the IS_ERR_VALUE() there but changed the type to 'unsigned long'. For 9pfs, I just worked around one user whose calling conventions are so obscure that I did not dare change the behavior. I was using this definition for testing: #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \ unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO)) which ends up making all 16-bit or wider types work correctly with the most plausible interpretation of what IS_ERR_VALUE() was supposed to return according to its users, but also causes a compile-time warning for any users that do not pass an 'unsigned long' argument. I suggested this approach earlier this year, but back then we ended up deciding to just fix the users that are obviously broken. After the initial warning that caused me to get involved in the discussion (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus asked me to send the whole thing again. [ Updated the 9p parts as per Al Viro - Linus ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.org/lkml/2016/1/7/363 Link: https://lkml.org/lkml/2016/5/27/486 Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'kbuild' of ↵Linus Torvalds2016-05-261-0/+1
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and unexports symbols which are not used in the current config [Nicolas Pitre] - several kbuild rule cleanups [Masahiro Yamada] - warning option adjustments for gcov etc [Arnd Bergmann] - a few more small fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits) kbuild: move -Wunused-const-variable to W=1 warning level kbuild: fix if_change and friends to consider argument order kbuild: fix adjust_autoksyms.sh for modules that need only one symbol kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line gcov: disable -Wmaybe-uninitialized warning gcov: disable tree-loop-im to reduce stack usage gcov: disable for COMPILE_TEST Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES Kbuild: change CC_OPTIMIZE_FOR_SIZE definition kbuild: forbid kernel directory to contain spaces and colons kbuild: adjust ksym_dep_filter for some cmd_* renames kbuild: Fix dependencies for final vmlinux link kbuild: better abstract vmlinux sequential prerequisites kbuild: fix call to adjust_autoksyms.sh when output directory specified kbuild: Get rid of KBUILD_STR kbuild: rename cmd_as_s_S to cmd_cpp_s_S kbuild: rename cmd_cc_i_c to cmd_cpp_i_c kbuild: drop redundant "PHONY += FORCE" kbuild: delete unnecessary "@:" kbuild: mark help target as PHONY ...
| * | | | | | | | gcov: disable for COMPILE_TESTArnd Bergmann2016-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling gcov is counterproductive to compile testing: it significantly increases the kernel image size, compile time, and it produces lots of false positive "may be used uninitialized" warnings as the result of missed optimizations. This is in line with how UBSAN_SANITIZE_ALL and PROFILE_ALL_BRANCHES work, both of which have similar problems. With an ARM allmodconfig kernel, I see the build time drop from 283 minutes CPU time to 225 minutes, and the vmlinux size drops from 43MB to 26MB. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Michal Marek <mmarek@suse.com>
* | | | | | | | | mm: oom_reaper: remove some bloatMichal Hocko2016-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mmput_async is currently used only from the oom_reaper which is defined only for CONFIG_MMU. We can save work_struct in mm_struct for !CONFIG_MMU. [akpm@linux-foundation.org: fix typo, per Minchan] Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz Reported-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2016-05-251-7/+11
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: one for a lost wakeup, the other to fix the compiler optimizing out preempt operations on ARM64 (and possibly other non-x86 architectures)" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix remote wakeups sched/preempt: Fix preempt_count manipulations
| * | | | | | | | | sched/core: Fix remote wakeupsPeter Zijlstra2016-05-251-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration") ... introduced a bug: Mike Galbraith found that it introduced a performance regression, while Paul E. McKenney reported lost wakeups and bisected it to this commit. The reason is that I mis-read ttwu_queue() such that I assumed any wakeup that got a remote queue must have had the task migrated. Since this is not so; we need to transfer this information between queueing the wakeup and actually doing the wakeup. Use a new task_struct::sched_flag for this, we already write to sched_contributes_to_load in the wakeup path so this is a hot and modified cacheline. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ben Segall <bsegall@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Pavan Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: byungchul.park@lge.com Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration") Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-05-253-14/+36
|\ \ \ \ \ \ \ \ \ \ | |_|_|/ / / / / / / |/| | | | | | | / / | | |_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Mostly tooling and PMU driver fixes, but also a number of late updates such as the reworking of the call-chain size limiting logic to make call-graph recording more robust, plus tooling side changes for the new 'backwards ring-buffer' extension to the perf ring-buffer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) perf record: Read from backward ring buffer perf record: Rename variable to make code clear perf record: Prevent reading invalid data in record__mmap_read perf evlist: Add API to pause/resume perf trace: Use the ptr->name beautifier as default for "filename" args perf trace: Use the fd->name beautifier as default for "fd" args perf report: Add srcline_from/to branch sort keys perf evsel: Record fd into perf_mmap perf evsel: Add overwrite attribute and check write_backward perf tools: Set buildid dir under symfs when --symfs is provided perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced perf annotate: Sort list of recognised instructions perf annotate: Fix identification of ARM blt and bls instructions perf tools: Fix usage of max_stack sysctl perf callchain: Stop validating callchains by the max_stack sysctl perf trace: Fix exit_group() formatting perf top: Use machine->kptr_restrict_warned perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1 perf machine: Do not bail out if not managing to read ref reloc symbol perf/x86/intel/p4: Trival indentation fix, remove space ...
| * | | | | | | | Merge tag 'perf-core-for-mingo-20160516' of ↵Ingo Molnar2016-05-203-14/+36
| |\ \ \ \ \ \ \ \ | | |/ / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Honour the kernel.perf_event_max_stack knob more precisely by not counting PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo) - Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim) - Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim) - Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim) - Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen) - Store vdso buildid unconditionally, as it appears in callchains and we're not checking those when creating the build-id table, so we end up not being able to resolve VDSO symbols when doing analysis on a different machine than the one where recording was done, possibly of a different arch even (arm -> x86_64) (He Kuang) Infrastructure changes: - Generalize max_stack sysctl handler, will be used for configuring multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo) Cleanups: - Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using open coded strings (Masami Hiramatsu) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | perf core: Separate accounting of contexts and real addresses in a stack traceArnaldo Carvalho de Melo2016-05-162-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf_sample->ip_callchain->nr value includes all the entries in the ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc}, while what the user expects is that what is in the kernel.perf_event_max_stack sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be honoured in terms of IP addresses in the stack trace. So allocate a bunch of extra entries for contexts, and do the accounting via perf_callchain_entry_ctx struct members. A new sysctl, kernel.perf_event_max_contexts_per_stack is also introduced for investigating possible bugs in the callchain implementation by some arch. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | perf core: Add perf_callchain_store_context() helperArnaldo Carvalho de Melo2016-05-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need have different helpers to account how many contexts we have in the sample and for real addresses, so do it now as a prep patch, to ease review. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | perf core: Add a 'nr' field to perf_event_callchain_contextArnaldo Carvalho de Melo2016-05-161-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will use it to count how many addresses are in the entry->ip[] array, excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really return the number of entries specified by the user via the relevant sysctl, kernel.perf_event_max_contexts, or via the per event perf_event_attr.sample_max_stack knob. This way we keep the perf_sample->ip_callchain->nr meaning, that is the number of entries, be it real addresses or PERF_CONTEXT_ entries, while honouring the max_stack knobs, i.e. the end result will be max_stack entries if we have at least that many entries in a given stack trace. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
OpenPOWER on IntegriCloud