From 5b56d49fc31dbb0487e14ead790fc81ca9fb2c99 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Wed, 14 Dec 2016 15:06:52 -0800 Subject: mm: add locked parameter to get_user_pages_remote() Patch series "mm: unexport __get_user_pages_unlocked()". This patch series continues the cleanup of get_user_pages*() functions taking advantage of the fact we can now pass gup_flags as we please. It firstly adds an additional 'locked' parameter to get_user_pages_remote() to allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of this and no other existing higher level function would allow it to do so. Secondly existing callers of __get_user_pages_unlocked() are replaced with the appropriate higher-level replacement - get_user_pages_unlocked() if the current task and memory descriptor are referenced, or get_user_pages_remote() if other task/memory descriptors are referenced (having acquiring mmap_sem.) This patch (of 2): Add a int *locked parameter to get_user_pages_remote() to allow VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked(). Taking into account the previous adjustments to get_user_pages*() functions allowing for the passing of gup_flags, we are now in a position where __get_user_pages_unlocked() need only be exported for his ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport __get_user_pages_unlocked() as well as allowing for future flexibility in the use of get_user_pages_remote(). [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change] Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes Acked-by: Michal Hocko Cc: Jan Kara Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Mel Gorman Cc: Paolo Bonzini Cc: Radim Krcmar Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index a92c8d7..cc15445 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1274,7 +1274,7 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr, long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas); + struct vm_area_struct **vmas, int *locked); long get_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); -- cgit v1.1 From 8b7457ef9a9eb46cd1675d40d8e1fd3c47a38395 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Wed, 14 Dec 2016 15:06:55 -0800 Subject: mm: unexport __get_user_pages_unlocked() Unexport the low-level __get_user_pages_unlocked() function and replaces invocations with calls to more appropriate higher-level functions. In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with get_user_pages_unlocked() since we can now pass gup_flags. In async_pf_execute() and process_vm_rw_single_vec() we need to pass different tsk, mm arguments so get_user_pages_remote() is the sane replacement in these cases (having added manual acquisition and release of mmap_sem.) Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH flag. However, this flag was originally silently dropped by commit 1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been unintentional and reintroducing it is therefore not an issue. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes Acked-by: Michal Hocko Cc: Jan Kara Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Mel Gorman Cc: Paolo Bonzini Cc: Radim Krcmar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index cc15445..7b2d14e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1280,9 +1280,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); -long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm, - unsigned long start, unsigned long nr_pages, - struct page **pages, unsigned int gup_flags); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); int get_user_pages_fast(unsigned long start, int nr_pages, int write, -- cgit v1.1 From 82b0f8c39a3869b6fd2a10e180a862248736ec6f Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:06:58 -0800 Subject: mm: join struct fault_env and vm_fault Currently we have two different structures for passing fault information around - struct vm_fault and struct fault_env. DAX will need more information in struct vm_fault to handle its faults so the content of that structure would become event closer to fault_env. Furthermore it would need to generate struct fault_env to be able to call some of the generic functions. So at this point I don't think there's much use in keeping these two structures separate. Just embed into struct vm_fault all that is needed to use it for both purposes. Link: http://lkml.kernel.org/r/1479460644-25076-2-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Acked-by: Kirill A. Shutemov Cc: Ross Zwisler Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b2d14e..de5bcea 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -292,10 +292,16 @@ extern pgprot_t protection_map[16]; * pgoff should be used in favour of virtual_address, if possible. */ struct vm_fault { + struct vm_area_struct *vma; /* Target VMA */ unsigned int flags; /* FAULT_FLAG_xxx flags */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - void __user *virtual_address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address */ + void __user *virtual_address; /* Faulting virtual address masked by + * PAGE_MASK */ + pmd_t *pmd; /* Pointer to pmd entry matching + * the 'address' + */ struct page *cow_page; /* Handler may choose to COW */ struct page *page; /* ->fault handlers should return a @@ -309,19 +315,7 @@ struct vm_fault { * VM_FAULT_DAX_LOCKED and fill in * entry here. */ -}; - -/* - * Page fault context: passes though page fault handler instead of endless list - * of function arguments. - */ -struct fault_env { - struct vm_area_struct *vma; /* Target VMA */ - unsigned long address; /* Faulting virtual address */ - unsigned int flags; /* FAULT_FLAG_xxx flags */ - pmd_t *pmd; /* Pointer to pmd entry matching - * the 'address' - */ + /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page * table hasn't been allocated. @@ -351,7 +345,7 @@ struct vm_operations_struct { int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf); int (*pmd_fault)(struct vm_area_struct *, unsigned long address, pmd_t *, unsigned int flags); - void (*map_pages)(struct fault_env *fe, + void (*map_pages)(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff); /* notification that a previously read-only page is about to become @@ -625,7 +619,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) return pte; } -int alloc_set_pte(struct fault_env *fe, struct mem_cgroup *memcg, +int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, struct page *page); #endif @@ -2094,7 +2088,7 @@ extern void truncate_inode_pages_final(struct address_space *); /* generic vm_area_ops exported for stackable file systems */ extern int filemap_fault(struct vm_area_struct *, struct vm_fault *); -extern void filemap_map_pages(struct fault_env *fe, +extern void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff); extern int filemap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf); -- cgit v1.1 From 1a29d85eb0f19b7d8271923d8917d7b4f5540b3e Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:01 -0800 Subject: mm: use vmf->address instead of of vmf->virtual_address Every single user of vmf->virtual_address typed that entry to unsigned long before doing anything with it so the type of virtual_address does not really provide us any additional safety. Just use masked vmf->address which already has the appropriate type. Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Acked-by: Kirill A. Shutemov Cc: Dan Williams Cc: Ross Zwisler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index de5bcea..75fda0d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -297,8 +297,6 @@ struct vm_fault { gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ unsigned long address; /* Faulting virtual address */ - void __user *virtual_address; /* Faulting virtual address masked by - * PAGE_MASK */ pmd_t *pmd; /* Pointer to pmd entry matching * the 'address' */ -- cgit v1.1 From 2994302bc8a17180788fac66a47102d338d5d0ec Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:16 -0800 Subject: mm: add orig_pte field into vm_fault Add orig_pte field to vm_fault structure to allow ->page_mkwrite handlers to fully handle the fault. This also allows us to save some passing of extra arguments around. Link: http://lkml.kernel.org/r/1479460644-25076-8-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Ross Zwisler Acked-by: Kirill A. Shutemov Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 75fda0d..39c17a2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -298,8 +298,8 @@ struct vm_fault { pgoff_t pgoff; /* Logical page offset based on vma */ unsigned long address; /* Faulting virtual address */ pmd_t *pmd; /* Pointer to pmd entry matching - * the 'address' - */ + * the 'address' */ + pte_t orig_pte; /* Value of PTE at the time of fault */ struct page *cow_page; /* Handler may choose to COW */ struct page *page; /* ->fault handlers should return a -- cgit v1.1 From 3917048d4572b9cabf6f8f5ad395eb693717367c Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:18 -0800 Subject: mm: allow full handling of COW faults in ->fault handlers Patch series "dax: Clear dirty bits after flushing caches", v5. Patchset to clear dirty bits from radix tree of DAX inodes when caches for corresponding pfns have been flushed. In principle, these patches enable handlers to easily update PTEs and do other work necessary to finish the fault without duplicating the functionality present in the generic code. I'd like to thank Kirill and Ross for reviews of the series! This patch (of 20): To allow full handling of COW faults add memcg field to struct vm_fault and a return value of ->fault() handler meaning that COW fault is fully handled and memcg charge must not be canceled. This will allow us to remove knowledge about special DAX locking from the generic fault code. Link: http://lkml.kernel.org/r/1479460644-25076-9-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Ross Zwisler Acked-by: Kirill A. Shutemov Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 39c17a2..6e25f49 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -301,7 +301,8 @@ struct vm_fault { * the 'address' */ pte_t orig_pte; /* Value of PTE at the time of fault */ - struct page *cow_page; /* Handler may choose to COW */ + struct page *cow_page; /* Page handler may use for COW fault */ + struct mem_cgroup *memcg; /* Cgroup cow_page belongs to */ struct page *page; /* ->fault handlers should return a * page here, unless VM_FAULT_NOPAGE * is set (which is also implied by @@ -1103,6 +1104,7 @@ static inline void clear_page_pfmemalloc(struct page *page) #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */ #define VM_FAULT_FALLBACK 0x0800 /* huge page fault failed, fall back to small */ #define VM_FAULT_DAX_LOCKED 0x1000 /* ->fault has locked DAX entry */ +#define VM_FAULT_DONE_COW 0x2000 /* ->fault has fully handled COW */ #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */ -- cgit v1.1 From 9118c0cbd44262d0015568266f314e645ed6b9ce Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:21 -0800 Subject: mm: factor out functionality to finish page faults Introduce finish_fault() as a helper function for finishing page faults. It is rather thin wrapper around alloc_set_pte() but since we'd want to call this from DAX code or filesystems, it is still useful to avoid some boilerplate code. Link: http://lkml.kernel.org/r/1479460644-25076-10-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Ross Zwisler Acked-by: Kirill A. Shutemov Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 6e25f49..60a230e6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -620,6 +620,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, struct page *page); +int finish_fault(struct vm_fault *vmf); #endif /* -- cgit v1.1 From b1aa812b21084285e9f6098639be9cd5bf9e05d7 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:24 -0800 Subject: mm: move handling of COW faults into DAX code Move final handling of COW faults from generic code into DAX fault handler. That way generic code doesn't have to be aware of peculiarities of DAX locking so remove that knowledge and make locking functions private to fs/dax.c. Link: http://lkml.kernel.org/r/1479460644-25076-11-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Acked-by: Kirill A. Shutemov Reviewed-by: Ross Zwisler Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 60a230e6..59a4da1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -308,12 +308,6 @@ struct vm_fault { * is set (which is also implied by * VM_FAULT_ERROR). */ - void *entry; /* ->fault handler can alternatively - * return locked DAX entry. In that - * case handler should return - * VM_FAULT_DAX_LOCKED and fill in - * entry here. - */ /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page @@ -1104,8 +1098,7 @@ static inline void clear_page_pfmemalloc(struct page *page) #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */ #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */ #define VM_FAULT_FALLBACK 0x0800 /* huge page fault failed, fall back to small */ -#define VM_FAULT_DAX_LOCKED 0x1000 /* ->fault has locked DAX entry */ -#define VM_FAULT_DONE_COW 0x2000 /* ->fault has fully handled COW */ +#define VM_FAULT_DONE_COW 0x1000 /* ->fault has fully handled COW */ #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */ -- cgit v1.1 From 66a6197c118540d454913eef24d68d7491ab5d5f Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:39 -0800 Subject: mm: provide helper for finishing mkwrite faults Provide a helper function for finishing write faults due to PTE being read-only. The helper will be used by DAX to avoid the need of complicating generic MM code with DAX locking specifics. Link: http://lkml.kernel.org/r/1479460644-25076-16-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Ross Zwisler Acked-by: Kirill A. Shutemov Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index 59a4da1..cec967e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -615,6 +615,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, struct page *page); int finish_fault(struct vm_fault *vmf); +int finish_mkwrite_fault(struct vm_fault *vmf); #endif /* -- cgit v1.1 From cae1240257d9ba4b40eb240124c530de8ee349bc Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 14 Dec 2016 15:07:45 -0800 Subject: mm: export follow_pte() DAX will need to implement its own version of page_check_address(). To avoid duplicating page table walking code, export follow_pte() which does what we need. Link: http://lkml.kernel.org/r/1479460644-25076-18-git-send-email-jack@suse.cz Signed-off-by: Jan Kara Reviewed-by: Ross Zwisler Cc: Kirill A. Shutemov Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/mm.h') diff --git a/include/linux/mm.h b/include/linux/mm.h index cec967e..6392649 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1210,6 +1210,8 @@ int copy_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *vma); void unmap_mapping_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen, int even_cows); +int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp, + spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, unsigned long *pfn); int follow_phys(struct vm_area_struct *vma, unsigned long address, -- cgit v1.1