op-kernel-dev - Development kernel branch for OpenPOWER systems

diff options

author	Tejun Heo <tj@kernel.org>	2013-11-22 17:14:39 -0500
committer	Tejun Heo <tj@kernel.org>	2013-11-22 17:14:39 -0500
commit	e5fca243abae1445afbfceebda5f08462ef869d3 (patch)
tree	4c5dc3301b6fe77fc70b4567c9f2c89c42a8d34c /init
parent	6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae (diff)
download	op-kernel-dev-e5fca243abae1445afbfceebda5f08462ef869d3.zip op-kernel-dev-e5fca243abae1445afbfceebda5f08462ef869d3.tar.gz

cgroup: use a dedicated workqueue for cgroup destruction

Since be44562613851 ("cgroup: remove synchronize_rcu() from cgroup_diput()"), cgroup destruction path makes use of workqueue. css freeing is performed from a work item from that point on and a later commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two steps"), moves css offlining to workqueue too. As cgroup destruction isn't depended upon for memory reclaim, the destruction work items were put on the system_wq; unfortunately, some controller may block in the destruction path for considerable duration while holding cgroup_mutex. As large part of destruction path is synchronized through cgroup_mutex, when combined with high rate of cgroup removals, this has potential to fill up system_wq's max_active of 256. Also, it turns out that memcg's css destruction path ends up queueing and waiting for work items on system_wq through work_on_cpu(). If such operation happens while system_wq is fully occupied by cgroup destruction work items, work_on_cpu() can't make forward progress because system_wq is full and other destruction work items on system_wq can't make forward progress because the work item waiting for work_on_cpu() is holding cgroup_mutex, leading to deadlock. This can be fixed by queueing destruction work items on a separate workqueue. This patch creates a dedicated workqueue - cgroup_destroy_wq - for this purpose. As these work items shouldn't have inter-dependencies and mostly serialized by cgroup_mutex anyway, giving high concurrency level doesn't buy anything and the workqueue's @max_active is set to 1 so that destruction work items are executed one by one on each CPU. Hugh Dickins: Because cgroup_init() is run before init_workqueues(), cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a separate core_initcall(). In the future, we probably want to reorder so that workqueue init happens before cgroup_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Hugh Dickins <hughd@google.com> Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils Cc: stable@vger.kernel.org # v3.9+

Diffstat (limited to 'init')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: