sched/core: Implement new approach to scale select_idle_cpu()

Hackbench recently suffered a bunch of pain, first by commit: 4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive") and then by commit: c743f0a5c50f ("sched/fair, cpumask: Export for_each_cpu_wrap()") which fixed a bug in the initial for_each_cpu_wrap() implementation that made select_idle_cpu() even more expensive. The bug was that it would skip over CPUs when bits were consequtive in the bitmask. This however gave me an idea to fix select_idle_cpu(); where the old scheme was a cliff-edge throttle on idle scanning, this introduces a more gradual approach. Instead of stopping to scan entirely, we limit how many CPUs we scan. Initial benchmarks show that it mostly recovers hackbench while not hurting anything else, except Mason's schbench, but not as bad as the old thing. It also appears to recover the tbench high-end, which also suffered like hackbench. Tested-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: kitsunyan <kitsunyan@inbox.ru> Cc: linux-kernel@vger.kernel.org Cc: lvenanci@redhat.com Cc: riel@redhat.com Cc: xiaolong.ye@intel.com Link: http://lkml.kernel.org/r/20170517105350.hk5m4h4jb6dfr65a@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2017-05-17 12:53:50 +0200
committer: Ingo Molnar <mingo@kernel.org> 2017-06-08 10:25:17 +0200
commit: 1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 (patch)
tree: e3db01782e505b85244eeb7e799e535b9c7b319a /kernel/sched/fair.c
parent: 9b01d43170aa70a435105f6413759e2ab7e00219 (diff)
download: op-kernel-dev-1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4.zip
op-kernel-dev-1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4.tar.gz
1 files changed, 16 insertions, 5 deletions
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 47a0c55..396bca9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5794,27 +5794,38 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle = this_rq()->avg_idle;
+	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu;
+	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
 		return -1;
 
-	avg_cost = this_sd->avg_scan_cost;
-
 	/*
 	 * Due to large variance we need a large fuzz factor; hackbench in
 	 * particularly is sensitive here.
 	 */
-	if (sched_feat(SIS_AVG_CPU) && (avg_idle / 512) < avg_cost)
+	avg_idle = this_rq()->avg_idle / 512;
+	avg_cost = this_sd->avg_scan_cost + 1;
+
+	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
 		return -1;
 
+	if (sched_feat(SIS_PROP)) {
+		u64 span_avg = sd->span_weight * avg_idle;
+		if (span_avg > 4*avg_cost)
+			nr = div_u64(span_avg, avg_cost);
+		else
+			nr = 4;
+	}
+
 	time = local_clock();
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+		if (!--nr)
+			return -1;
 		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
 			continue;
 		if (idle_cpu(cpu))
author	Peter Zijlstra <peterz@infradead.org>	2017-05-17 12:53:50 +0200
committer	Ingo Molnar <mingo@kernel.org>	2017-06-08 10:25:17 +0200
commit	1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 (patch)
tree	e3db01782e505b85244eeb7e799e535b9c7b319a /kernel/sched/fair.c
parent	9b01d43170aa70a435105f6413759e2ab7e00219 (diff)
download	op-kernel-dev-1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4.zip op-kernel-dev-1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4.tar.gz