From 510fc4e11b772fd60f2c545c64d4c55abd07ce36 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Tue, 18 Dec 2012 14:21:47 -0800 Subject: memcg: kmem accounting basic infrastructure Add the basic infrastructure for the accounting of kernel memory. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes They have the same meaning of their user memory counterparts. They reflect the state of the "kmem" res_counter. Per cgroup kmem memory accounting is not enabled until a limit is set for the group. Once the limit is set the accounting cannot be disabled for that group. This means that after the patch is applied, no behavioral changes exists for whoever is still using memcg to control their memory usage, until memory.kmem.limit_in_bytes is set for the first time. We always account to both user and kernel resource_counters. This effectively means that an independent kernel limit is in place when the limit is set to a lower value than the user memory. A equal or higher value means that the user limit will always hit first, meaning that kmem is effectively unlimited. People who want to track kernel memory but not limit it, can set this limit to a very high number (like RESOURCE_MAX - 1page - that no one will ever hit, or equal to the user memory) [akpm@linux-foundation.org: MEMCG_MMEM only works with slab and slub] Signed-off-by: Glauber Costa Acked-by: Kamezawa Hiroyuki Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Tejun Heo Cc: Christoph Lameter Cc: David Rientjes Cc: Frederic Weisbecker Cc: Greg Thelen Cc: JoonSoo Kim Cc: Mel Gorman Cc: Pekka Enberg Cc: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 675d8a2..19ccb33 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -882,6 +882,7 @@ config MEMCG_SWAP_ENABLED config MEMCG_KMEM bool "Memory Resource Controller Kernel Memory accounting (EXPERIMENTAL)" depends on MEMCG && EXPERIMENTAL + depends on SLUB || SLAB default n help The Kernel Memory extension for Memory Resource Controller can limit -- cgit v1.1 From d7f25f8a2f81252d1ac134470ba1d0a287cf8fcd Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Tue, 18 Dec 2012 14:22:40 -0800 Subject: memcg: infrastructure to match an allocation to the right cache The page allocator is able to bind a page to a memcg when it is allocated. But for the caches, we'd like to have as many objects as possible in a page belonging to the same cache. This is done in this patch by calling memcg_kmem_get_cache in the beginning of every allocation function. This function is patched out by static branches when kernel memory controller is not being used. It assumes that the task allocating, which determines the memcg in the page allocator, belongs to the same cgroup throughout the whole process. Misaccounting can happen if the task calls memcg_kmem_get_cache() while belonging to a cgroup, and later on changes. This is considered acceptable, and should only happen upon task migration. Before the cache is created by the memcg core, there is also a possible imbalance: the task belongs to a memcg, but the cache being allocated from is the global cache, since the child cache is not yet guaranteed to be ready. This case is also fine, since in this case the GFP_KMEMCG will not be passed and the page allocator will not attempt any cgroup accounting. Signed-off-by: Glauber Costa Cc: Christoph Lameter Cc: David Rientjes Cc: Frederic Weisbecker Cc: Greg Thelen Cc: Johannes Weiner Cc: JoonSoo Kim Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Michal Hocko Cc: Pekka Enberg Cc: Rik van Riel Cc: Suleiman Souhlal Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 1 - 1 file changed, 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 19ccb33..7d30240 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -883,7 +883,6 @@ config MEMCG_KMEM bool "Memory Resource Controller Kernel Memory accounting (EXPERIMENTAL)" depends on MEMCG && EXPERIMENTAL depends on SLUB || SLAB - default n help The Kernel Memory extension for Memory Resource Controller can limit the amount of memory used by kernel objects in the system. Those are -- cgit v1.1 From ae903caae267154de7cf8576b130ff474630596b Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 14 Dec 2012 12:44:11 -0500 Subject: Bury the conditionals from kernel_thread/kernel_execve series All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro --- init/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index e33e09d..155ac20 100644 --- a/init/main.c +++ b/init/main.c @@ -797,7 +797,9 @@ static void __init do_pre_smp_initcalls(void) static int run_init_process(const char *init_filename) { argv_init[0] = init_filename; - return kernel_execve(init_filename, argv_init, envp_init); + return do_execve(init_filename, + (const char __user *const __user *)argv_init, + (const char __user *const __user *)envp_init); } static void __init kernel_init_freeable(void); -- cgit v1.1 From f80b0c904da93b9ad7db2fd9823dd701932df779 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 21 Dec 2012 12:25:44 +0530 Subject: Ensure that kernel_init_freeable() is not inlined into non __init code Commit d6b2123802d "make sure that we always have a return path from kernel_execve()" reshuffled kernel_init()/init_post() to ensure that kernel_execve() has a caller to return to. It removed __init annotation for kernel_init() and introduced/calls a new routine kernel_init_freeable(). Latter however is inlined by any reasonable compiler (ARC gcc 4.4 in this case), causing slight code bloat. This patch forces kernel_init_freeable() as noinline reducing the .text bloat-o-meter vmlinux vmlinux_new add/remove: 1/0 grow/shrink: 0/1 up/down: 374/-334 (40) function old new delta kernel_init_freeable - 374 +374 (.init.text) kernel_init 628 294 -334 (.text) Signed-off-by: Al Viro --- init/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 85d69df..92d728a 100644 --- a/init/main.c +++ b/init/main.c @@ -802,7 +802,7 @@ static int run_init_process(const char *init_filename) (const char __user *const __user *)envp_init); } -static void __init kernel_init_freeable(void); +static noinline void __init kernel_init_freeable(void); static int __ref kernel_init(void *unused) { @@ -845,7 +845,7 @@ static int __ref kernel_init(void *unused) "See Linux Documentation/init.txt for guidance."); } -static void __init kernel_init_freeable(void) +static noinline void __init kernel_init_freeable(void) { /* * Wait until kthreadd is all set-up. -- cgit v1.1 From 3a55fb0d9fe8e2f4594329edd58c5fd6f35a99dd Mon Sep 17 00:00:00 2001 From: Kirill Smelkov Date: Fri, 2 Nov 2012 15:41:01 +0400 Subject: Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZE In commit 281dc5c5ec0f ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we already changed the actual default value, but the help-text still suggested 'y'. Fix the help text too, for all the same reasons. Sadly, -Os keeps on generating some very suboptimal code for certain cases, to the point where any I$ miss upside is swamped by the downside. The main ones are: - using "rep movsb" for memcpy, even on CPU's where that is horrendously bad for performance. - not honoring branch prediction information, so any I$ footprint you win from smaller code, you lose from less code density in the I$. - using divide instructions when that is very expensive. Signed-off-by: Kirill Smelkov Signed-off-by: Linus Torvalds --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 7d30240..be8b7f5 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1182,7 +1182,7 @@ config CC_OPTIMIZE_FOR_SIZE Enabling this option will pass "-Os" instead of "-O2" to gcc resulting in a smaller kernel. - If unsure, say Y. + If unsure, say N. config SYSCTL bool -- cgit v1.1 From 43b16820249396aea7eb57c747106e211e54bed5 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 19 Jan 2013 13:29:54 -0500 Subject: make sure that /linuxrc has std{in,out,err} Signed-off-by: Al Viro --- init/do_mounts_initrd.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'init') diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c index 5e4ded5..f9acf71 100644 --- a/init/do_mounts_initrd.c +++ b/init/do_mounts_initrd.c @@ -36,6 +36,10 @@ __setup("noinitrd", no_initrd); static int init_linuxrc(struct subprocess_info *info, struct cred *new) { sys_unshare(CLONE_FS | CLONE_FILES); + /* stdin/stdout/stderr for /linuxrc */ + sys_open("/dev/console", O_RDWR, 0); + sys_dup(0); + sys_dup(0); /* move initrd over / and chdir/chroot in initrd root */ sys_chdir("/root"); sys_mount(".", "/", NULL, MS_MOVE, NULL); -- cgit v1.1