diff options
author | Johannes Weiner <hannes@cmpxchg.org> | 2016-03-17 14:20:28 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-03-17 15:09:34 -0700 |
commit | b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f (patch) | |
tree | 4827a7b163fc3b97c8ae86d31315f1e508b5753c /Documentation/cgroup-v2.txt | |
parent | 588083bb37a3cea8533c392370a554417c8f29cb (diff) | |
download | op-kernel-dev-b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f.zip op-kernel-dev-b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f.tar.gz |
mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage
Setting the original memory.limit_in_bytes hardlimit is subject to a
race condition when the desired value is below the current usage. The
code tries a few times to first reclaim and then see if the usage has
dropped to where we would like it to be, but there is no locking, and
the workload is free to continue making new charges up to the old limit.
Thus, attempting to shrink a workload relies on pure luck and hope that
the workload happens to cooperate.
To fix this in the cgroup2 memory.max knob, do it the other way round:
set the limit first, then try enforcement. And if reclaim is not able
to succeed, trigger OOM kills in the group. Keep going until the new
limit is met, we run out of OOM victims and there's only unreclaimable
memory left, or the task writing to memory.max is killed. This allows
users to shrink groups reliably, and the behavior is consistent with
what happens when new charges are attempted in excess of memory.max.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/cgroup-v2.txt')
-rw-r--r-- | Documentation/cgroup-v2.txt | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index e2f4e79..8f1329a 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1387,6 +1387,12 @@ system than killing the group. Otherwise, memory.max is there to limit this type of spillover and ultimately contain buggy or even malicious applications. +Setting the original memory.limit_in_bytes below the current usage was +subject to a race condition, where concurrent charges could cause the +limit setting to fail. memory.max on the other hand will first set the +limit to prevent new charges, and then reclaim and OOM kill until the +new limit is met - or the task writing to memory.max is killed. + The combined memory+swap accounting and limiting is replaced by real control over swap space. |