From 4946d54cb55e86a156216fcfeed5568514b0830f Mon Sep 17 00:00:00 2001 From: Rik van Riel Date: Mon, 5 Apr 2010 12:13:33 -0400 Subject: rmap: fix anon_vma_fork() memory leak Fix a memory leak in anon_vma_fork(), where we fail to tear down the anon_vmas attached to the new VMA in case setting up the new anon_vma fails. This bug also has the potential to leave behind anon_vma_chain structs with pointers to invalid memory. Reported-by: Minchan Kim Signed-off-by: Rik van Riel Signed-off-by: Linus Torvalds --- mm/rmap.c | 1 + 1 file changed, 1 insertion(+) (limited to 'mm/rmap.c') diff --git a/mm/rmap.c b/mm/rmap.c index fcd593c9..eaa7a09 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -232,6 +232,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) out_error_free_anon_vma: anon_vma_free(anon_vma); out_error: + unlink_anon_vmas(vma); return -ENOMEM; } -- cgit v1.1 From 646d87b481dab4ba8301716600dfd276605b0ab0 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 11 Apr 2010 17:15:03 -0700 Subject: anon_vma: clone the anon_vma chain in the right order We want to walk the chain in reverse order when cloning it, so that the order of the result chain will be the same as the order in the source chain. When we add entries to the chain, they go at the head of the chain, so we want to add the source head last. Reviewed-by: Rik van Riel Acked-by: Johannes Weiner Tested-by: Borislav Petkov [ "No, it still oopses" ] Signed-off-by: Linus Torvalds --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/rmap.c') diff --git a/mm/rmap.c b/mm/rmap.c index eaa7a09..ee97d38 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -182,7 +182,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; - list_for_each_entry(pavc, &src->anon_vma_chain, same_vma) { + list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { avc = anon_vma_chain_alloc(); if (!avc) goto enomem_failure; -- cgit v1.1 From ea90002b0fa7bdee86ec22eba1d951f30bf043a6 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 12 Apr 2010 12:44:29 -0700 Subject: anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma Otherwise we might be mapping in a page in a new mapping, but that page (through the swapcache) would later be mapped into an old mapping too. The page->mapping must be the case that works for everybody, not just the mapping that happened to page it in first. Here's the scenario: - page gets allocated/mapped by process A. Let's call the anon_vma we associate the page with 'A' to keep it easy to track. - Process A forks, creating process B. The anon_vma in B is 'B', and has a chain that looks like 'B' -> 'A'. Everything is fine. - Swapping happens. The page (with mapping pointing to 'A') gets swapped out (perhaps not to disk - it's enough to assume that it's just not mapped any more, and lives entirely in the swap-cache) - Process B pages it in, which goes like this: do_swap_page -> page = lookup_swap_cache(entry); ... set_pte_at(mm, address, page_table, pte); page_add_anon_rmap(page, vma, address); And think about what happens here! In particular, what happens is that this will now be the "first" mapping of that page, so page_add_anon_rmap() used to do if (first) __page_set_anon_rmap(page, vma, address); and notice what anon_vma it will use? It will use the anon_vma for process B! What happens then? Trivial: process 'A' also pages it in (nothing happens, it's not the first mapping), and then process 'B' execve's or exits or unmaps, making anon_vma B go away. End result: process A has a page that points to anon_vma B, but anon_vma B does not exist any more. This can go on forever. Forget about RCU grace periods, forget about locking, forget anything like that. The bug is simply that page->mapping points to an anon_vma that was correct at one point, but was _not_ the one that was shared by all users of that possible mapping. Changing it to always use the deepest anon_vma in the anonvma chain gets us to the safest model. This can be improved in certain cases: if we know the page is private to just this particular mapping (for example, it's a new page, or it is the only swapcache entry), we could pick the top (most specific) anon_vma. But that's a future optimization. Make it _work_ reliably first. Reviewed-by: Rik van Riel Acked-by: Johannes Weiner Tested-by: Borislav Petkov [ "What do you know, I think you fixed it!" ] Signed-off-by: Linus Torvalds --- mm/rmap.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'mm/rmap.c') diff --git a/mm/rmap.c b/mm/rmap.c index ee97d38..4bad326 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -734,9 +734,20 @@ void page_move_anon_rmap(struct page *page, static void __page_set_anon_rmap(struct page *page, struct vm_area_struct *vma, unsigned long address) { - struct anon_vma *anon_vma = vma->anon_vma; + struct anon_vma_chain *avc; + struct anon_vma *anon_vma; + + BUG_ON(!vma->anon_vma); + + /* + * We must use the _oldest_ possible anon_vma for the page mapping! + * + * So take the last AVC chain entry in the vma, which is the deepest + * ancestor, and use the anon_vma from that. + */ + avc = list_entry(vma->anon_vma_chain.prev, struct anon_vma_chain, same_vma); + anon_vma = avc->anon_vma; - BUG_ON(!anon_vma); anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; page->mapping = (struct address_space *) anon_vma; page->index = linear_page_index(vma, address); -- cgit v1.1