From 8fdc65391c6ad16461526a685f03262b3b01bfde Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 29 Mar 2016 09:26:44 +0200 Subject: perf/core: Fix time tracking bug with multiplexing Stephane reported that commit: 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME") introduced a regression wrt. time tracking, as easily observed by: > This patch introduce a bug in the time tracking of events when > multiplexing is used. > > The issue is easily reproducible with the following perf run: > > $ perf stat -a -C 0 -e branches,branches,branches,branches,branches,branches -I 1000 > 1.000730239 652,394 branches (66.41%) > 1.000730239 597,809 branches (66.41%) > 1.000730239 593,870 branches (66.63%) > 1.000730239 651,440 branches (67.03%) > 1.000730239 656,725 branches (66.96%) > 1.000730239 branches > > One branches event is shown as not having run. Yet, with > multiplexing, all events should run especially with a 1s (-I 1000) > interval. The delta for time_running comes out to 0. Yet, the event > has run because the kernel is actually multiplexing the events. The > problem is that the time tracking is the kernel and especially in > ctx_sched_out() is wrong now. > > The problem is that in case that the kernel enters ctx_sched_out() with the > following state: > ctx->is_active=0x7 event_type=0x1 > Call Trace: > [] dump_stack+0x63/0x82 > [] ctx_sched_out+0x2bc/0x2d0 > [] perf_mux_hrtimer_handler+0xf6/0x2c0 > [] ? __perf_install_in_context+0x130/0x130 > [] __hrtimer_run_queues+0xf8/0x2f0 > [] hrtimer_interrupt+0xb7/0x1d0 > [] local_apic_timer_interrupt+0x38/0x60 > [] smp_apic_timer_interrupt+0x3d/0x50 > [] apic_timer_interrupt+0x8c/0xa0 > > In that case, the test: > if (is_active & EVENT_TIME) > > will be false and the time will not be updated. Time must always be updated on > sched out. Fix this by always updating time if EVENT_TIME was set, as opposed to only updating time when EVENT_TIME changed. Reported-by: Stephane Eranian Tested-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vince Weaver Cc: kan.liang@intel.com Cc: namhyung@kernel.org Fixes: 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME") Link: http://lkml.kernel.org/r/20160329072644.GB3408@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- kernel/events/core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'kernel/events') diff --git a/kernel/events/core.c b/kernel/events/core.c index de24fbc..8c11388 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2417,14 +2417,24 @@ static void ctx_sched_out(struct perf_event_context *ctx, cpuctx->task_ctx = NULL; } - is_active ^= ctx->is_active; /* changed bits */ - + /* + * Always update time if it was set; not only when it changes. + * Otherwise we can 'forget' to update time for any but the last + * context we sched out. For example: + * + * ctx_sched_out(.event_type = EVENT_FLEXIBLE) + * ctx_sched_out(.event_type = EVENT_PINNED) + * + * would only update time for the pinned events. + */ if (is_active & EVENT_TIME) { /* update (and stop) ctx time */ update_context_time(ctx); update_cgrp_time_from_cpuctx(cpuctx); } + is_active ^= ctx->is_active; /* changed bits */ + if (!ctx->nr_active || !(is_active & EVENT_ALL)) return; -- cgit v1.1 From 201c2f85bd0bc13b712d9c0b3d11251b182e06ae Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 21 Mar 2016 10:02:42 +0200 Subject: perf/core: Don't leak event in the syscall error path In the error path, event_file not being NULL is used to determine whether the event itself still needs to be free'd, so fix it up to avoid leaking. Reported-by: Leon Yu Signed-off-by: Alexander Shishkin Signed-off-by: Peter Zijlstra (Intel) Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 130056275ade ("perf: Do not double free") Link: http://lkml.kernel.org/r/87twk06yxp.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar --- kernel/events/core.c | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel/events') diff --git a/kernel/events/core.c b/kernel/events/core.c index 8c11388..52bedc5 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -8542,6 +8542,7 @@ SYSCALL_DEFINE5(perf_event_open, f_flags); if (IS_ERR(event_file)) { err = PTR_ERR(event_file); + event_file = NULL; goto err_context; } -- cgit v1.1 From 09cbfeaf1a5a67bfb3201e0c83c810cecb2efa5a Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 1 Apr 2016 15:29:47 +0300 Subject: mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ; - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov Acked-by: Michal Hocko Signed-off-by: Linus Torvalds --- kernel/events/uprobes.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'kernel/events') diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 220fc17..7edc95e 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -321,7 +321,7 @@ retry: copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); ret = __replace_page(vma, vaddr, old_page, new_page); - page_cache_release(new_page); + put_page(new_page); put_old: put_page(old_page); @@ -539,14 +539,14 @@ static int __copy_insn(struct address_space *mapping, struct file *filp, * see uprobe_register(). */ if (mapping->a_ops->readpage) - page = read_mapping_page(mapping, offset >> PAGE_CACHE_SHIFT, filp); + page = read_mapping_page(mapping, offset >> PAGE_SHIFT, filp); else - page = shmem_read_mapping_page(mapping, offset >> PAGE_CACHE_SHIFT); + page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT); if (IS_ERR(page)) return PTR_ERR(page); copy_from_page(page, offset, insn, nbytes); - page_cache_release(page); + put_page(page); return 0; } -- cgit v1.1