af_unix: charge buffers to kmemcg

Unix sockets can consume a significant amount of system memory, hence they should be accounted to kmemcg. Since unix socket buffers are always allocated from process context, all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in sock->sk_allocation mask. Eric asked: > 1) What happens when a buffer, allocated from socket <A> lands in a > different socket <B>, maybe owned by another user/process. > > Who owns it now, in term of kmemcg accounting ? We never move memcg charges. E.g. if two processes from different cgroups are sharing a memory region, each page will be charged to the process which touched it first. Or if two processes are working with the same directory tree, inodes and dentries will be charged to the first user. The same is fair for unix socket buffers - they will be charged to the sender. > 2) Has performance impact been evaluated ? I ran netperf STREAM_STREAM with default options in a kmemcg on a 4 core x2 HT box. The results are below: # clients bandwidth (10^6bits/sec) base patched 1 67643 +- 725 64874 +- 353 - 4.0 % 4 193585 +- 2516 186715 +- 1460 - 3.5 % 8 194820 +- 377 187443 +- 1229 - 3.7 % So the accounting doesn't come for free - it takes ~4% of performance. I believe we could optimize it by using per cpu batching not only on charge, but also on uncharge in memcg core, but that's beyond the scope of this patch set - I'll take a look at this later. Anyway, if performance impact is found to be unacceptable, it is always possible to disable kmem accounting at boot time (cgroup.memory=nokmem) or not use memory cgroups at runtime at all (thanks to jump labels there'll be no overhead even if they are compiled in). Link: http://lkml.kernel.org/r/fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Vladimir Davydov <vdavydov@virtuozzo.com> 2016-07-26 15:24:36 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2016-07-26 16:19:19 -0700
commit: 3aa9799e13645fda605e1c68831f2d4256a38537 (patch)
tree: 66777e68e0ed5b4140c70fe3d7fc07c38324a99a /net/unix
parent: d86133bd396f5e4a8d5629c1b853b574de4faf32 (diff)
download: op-kernel-dev-3aa9799e13645fda605e1c68831f2d4256a38537.zip
op-kernel-dev-3aa9799e13645fda605e1c68831f2d4256a38537.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 735362c..f1dffe8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -769,6 +769,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
 	lockdep_set_class(&sk->sk_receive_queue.lock,
 				&af_unix_sk_receive_queue_lock_key);
 
+	sk->sk_allocation	= GFP_KERNEL_ACCOUNT;
 	sk->sk_write_space	= unix_write_space;
 	sk->sk_max_ack_backlog	= net->unx.sysctl_max_dgram_qlen;
 	sk->sk_destruct		= unix_sock_destructor;
author	Vladimir Davydov <vdavydov@virtuozzo.com>	2016-07-26 15:24:36 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2016-07-26 16:19:19 -0700
commit	3aa9799e13645fda605e1c68831f2d4256a38537 (patch)
tree	66777e68e0ed5b4140c70fe3d7fc07c38324a99a /net/unix
parent	d86133bd396f5e4a8d5629c1b853b574de4faf32 (diff)
download	op-kernel-dev-3aa9799e13645fda605e1c68831f2d4256a38537.zip op-kernel-dev-3aa9799e13645fda605e1c68831f2d4256a38537.tar.gz