summaryrefslogtreecommitdiffstats
path: root/mm/slub.c
diff options
context:
space:
mode:
authorTejun Heo <tj@kernel.org>2017-02-22 15:41:30 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2017-02-22 16:41:27 -0800
commit01fb58bcba63f8fba37581c24c99e9a515dd0335 (patch)
tree475ebac1b656204783280c52acf315dfd3caea03 /mm/slub.c
parentc9fc586403e7c85eee06b2d5dea14ce71c00fcd8 (diff)
downloadop-kernel-dev-01fb58bcba63f8fba37581c24c99e9a515dd0335.zip
op-kernel-dev-01fb58bcba63f8fba37581c24c99e9a515dd0335.tar.gz
slab: remove synchronous synchronize_sched() from memcg cache deactivation path
With kmem cgroup support enabled, kmem_caches can be created and destroyed frequently and a great number of near empty kmem_caches can accumulate if there are a lot of transient cgroups and the system is not under memory pressure. When memory reclaim starts under such conditions, it can lead to consecutive deactivation and destruction of many kmem_caches, easily hundreds of thousands on moderately large systems, exposing scalability issues in the current slab management code. This is one of the patches to address the issue. slub uses synchronize_sched() to deactivate a memcg cache. synchronize_sched() is an expensive and slow operation and doesn't scale when a huge number of caches are destroyed back-to-back. While there used to be a simple batching mechanism, the batching was too restricted to be helpful. This patch implements slab_deactivate_memcg_cache_rcu_sched() which slub can use to schedule sched RCU callback instead of performing synchronize_sched() synchronously while holding cgroup_mutex. While this adds online cpus, mems and slab_mutex operations, operating on these locks back-to-back from the same kworker, which is what's gonna happen when there are many to deactivate, isn't expensive at all and this gets rid of the scalability problem completely. Link: http://lkml.kernel.org/r/20170117235411.9408-9-tj@kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jay Vana <jsvana@fb.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/slub.c')
-rw-r--r--mm/slub.c12
1 files changed, 8 insertions, 4 deletions
diff --git a/mm/slub.c b/mm/slub.c
index 8a45915..62d0b55 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3957,6 +3957,12 @@ int __kmem_cache_shrink(struct kmem_cache *s)
}
#ifdef CONFIG_MEMCG
+static void kmemcg_cache_deact_after_rcu(struct kmem_cache *s)
+{
+ /* called with all the locks held after a sched RCU grace period */
+ __kmem_cache_shrink(s);
+}
+
void __kmemcg_cache_deactivate(struct kmem_cache *s)
{
/*
@@ -3968,11 +3974,9 @@ void __kmemcg_cache_deactivate(struct kmem_cache *s)
/*
* s->cpu_partial is checked locklessly (see put_cpu_partial), so
- * we have to make sure the change is visible.
+ * we have to make sure the change is visible before shrinking.
*/
- synchronize_sched();
-
- __kmem_cache_shrink(s);
+ slab_deactivate_memcg_cache_rcu_sched(s, kmemcg_cache_deact_after_rcu);
}
#endif
OpenPOWER on IntegriCloud