perf_events: Fix event scheduling issues introduced by transactional API

The transactional API patch between the generic and model-specific code introduced several important bugs with event scheduling, at least on X86. If you had pinned events, e.g., watchdog, and were over-committing the PMU, you would get bogus counts. The bug was showing up on Intel CPU because events would move around more often that on AMD. But the problem also existed on AMD, though harder to expose. The issues were: - group_sched_in() was missing a cancel_txn() in the error path - cpuc->n_added was not properly maintained, leading to missing actions in hw_perf_enable(), i.e., n_running being 0. You cannot update n_added until you know the transaction has succeeded. In case of failed transaction n_added was not adjusted back. - in case of failed transactions, event_sched_out() was called and eventually invoked x86_disable_event() to touch the HW reg. But with transactions, on X86, event_sched_in() does not touch HW registers, it simply collects events into a list. Thus, you could end up calling x86_disable_event() on a counter which did not correspond to the current event when idx != -1. The patch modifies the generic and X86 code to avoid all those problems. First, we keep track of the number of events added last. In case the transaction fails, we substract them from n_added. This approach is necessary (as opposed to delaying updates to n_added) because not all event updates use the transaction API, e.g., single events. Second, we encapsulate the event_sched_in() and event_sched_out() in group_sched_in() inside the transaction. That makes the operations symmetrical and you can also detect that you are inside a transaction and skip the HW reg access by checking cpuc->group_flag. With this patch, you can now overcommit the PMU even with pinned system-wide events present and still get valid counts. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1274796225.5882.1389.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Stephane Eranian <eranian@google.com> 2010-05-25 16:23:10 +0200
committer: Ingo Molnar <mingo@elte.hu> 2010-05-31 08:46:10 +0200
commit: 90151c35b19633e0cab5a6c80f1ba4a51e7c913b (patch)
tree: 448c86520eef5b9dc0f06c59a8a96abfd4096fab /kernel/perf_event.c
parent: 2e97942fe57864588774f173cf4cd7bb68968b76 (diff)
download: op-kernel-dev-90151c35b19633e0cab5a6c80f1ba4a51e7c913b.zip
op-kernel-dev-90151c35b19633e0cab5a6c80f1ba4a51e7c913b.tar.gz
1 files changed, 7 insertions, 4 deletions
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 10a1aee..42a0e91 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -687,8 +687,11 @@ group_sched_in(struct perf_event *group_event,
 	if (txn)
 		pmu->start_txn(pmu);
 
-	if (event_sched_in(group_event, cpuctx, ctx))
+	if (event_sched_in(group_event, cpuctx, ctx)) {
+		if (txn)
+			pmu->cancel_txn(pmu);
 		return -EAGAIN;
+	}
 
 	/*
 	 * Schedule in siblings as one group (if any):
@@ -710,9 +713,6 @@ group_sched_in(struct perf_event *group_event,
 	}
 
 group_error:
-	if (txn)
-		pmu->cancel_txn(pmu);
-
 	/*
 	 * Groups can be scheduled in as one unit only, so undo any
 	 * partial group before returning:
@@ -724,6 +724,9 @@ group_error:
 	}
 	event_sched_out(group_event, cpuctx, ctx);
 
+	if (txn)
+		pmu->cancel_txn(pmu);
+
 	return -EAGAIN;
 }
author	Stephane Eranian <eranian@google.com>	2010-05-25 16:23:10 +0200
committer	Ingo Molnar <mingo@elte.hu>	2010-05-31 08:46:10 +0200
commit	90151c35b19633e0cab5a6c80f1ba4a51e7c913b (patch)
tree	448c86520eef5b9dc0f06c59a8a96abfd4096fab /kernel/perf_event.c
parent	2e97942fe57864588774f173cf4cd7bb68968b76 (diff)
download	op-kernel-dev-90151c35b19633e0cab5a6c80f1ba4a51e7c913b.zip op-kernel-dev-90151c35b19633e0cab5a6c80f1ba4a51e7c913b.tar.gz