From ab936cbcd02072a34b60d268f94440fd5cf1970b Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Thu, 12 Jan 2012 17:17:44 -0800 Subject: memcg: add mem_cgroup_replace_page_cache() to fix LRU issue Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki Acked-by: Johannes Weiner Acked-by: Michal Hocko Cc: Miklos Szeredi Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux/memcontrol.h') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index f944591..3558a5e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -122,6 +122,8 @@ struct zone_reclaim_stat* mem_cgroup_get_reclaim_stat_from_page(struct page *page); extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p); +extern void mem_cgroup_replace_page_cache(struct page *oldpage, + struct page *newpage); #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP extern int do_swap_account; @@ -369,6 +371,10 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx) { } +static inline void mem_cgroup_replace_page_cache(struct page *oldpage, + struct page *newpage) +{ +} #endif /* CONFIG_CGROUP_MEM_CONT */ #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM) -- cgit v1.1