From 63e95b5c4f16e156b98adcf2f7d820ba941c82a3 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Tue, 8 Nov 2016 11:32:20 +1100 Subject: dax: coordinate locking for offsets in PMD range DAX radix tree locking currently locks entries based on the unique combination of the 'mapping' pointer and the pgoff_t 'index' for the entry. This works for PTEs, but as we move to PMDs we will need to have all the offsets within the range covered by the PMD to map to the same bit lock. To accomplish this, for ranges covered by a PMD entry we will instead lock based on the page offset of the beginning of the PMD entry. The 'mapping' pointer is still used in the same way. Signed-off-by: Ross Zwisler Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Dave Chinner --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm') diff --git a/mm/filemap.c b/mm/filemap.c index 849f459..1ffb7dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -143,7 +143,7 @@ static int page_cache_tree_insert(struct address_space *mapping, if (node) workingset_node_pages_dec(node); /* Wakeup waiters for exceptional entry lock */ - dax_wake_mapping_entry_waiter(mapping, page->index, + dax_wake_mapping_entry_waiter(mapping, page->index, p, false); } } -- cgit v1.1 From 642261ac995e01d7837db1f4b90181496f7e6835 Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Tue, 8 Nov 2016 11:34:45 +1100 Subject: dax: add struct iomap based DAX PMD support DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based locking. This patch allows DAX PMDs to participate in the DAX radix tree based locking scheme so that they can be re-enabled using the new struct iomap based fault handlers. There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX mappings that have an associated block allocation, and 4k DAX empty entries. The empty entries exist to provide locking for the duration of a given page fault. This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP) entries, PMD DAX entries that have associated block allocations, and 2 MiB DAX empty entries. Unlike the 4k case where we insert a struct page* into the radix tree for 4k zero pages, for HZP we insert a DAX exceptional entry with the new RADIX_DAX_HZP flag set. This is because we use a single 2 MiB zero page in every 2MiB hole mapping, and it doesn't make sense to have that same struct page* with multiple entries in multiple trees. This would cause contention on the single page lock for the one Huge Zero Page, and it would break the page->index and page->mapping associations that are assumed to be valid in many other places in the kernel. One difficult use case is when one thread is trying to use 4k entries in radix tree for a given offset, and another thread is using 2 MiB entries for that same offset. The current code handles this by making the 2 MiB user fall back to 4k entries for most cases. This was done because it is the simplest solution, and because the use of 2MiB pages is already opportunistic. If we were to try to upgrade from 4k pages to 2MiB pages for a given range, we run into the problem of how we lock out 4k page faults for the entire 2MiB range while we clean out the radix tree so we can insert the 2MiB entry. We can solve this problem if we need to, but I think that the cases where both 2MiB entries and 4K entries are being used for the same range will be rare enough and the gain small enough that it probably won't be worth the complexity. Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Signed-off-by: Dave Chinner --- mm/filemap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/filemap.c b/mm/filemap.c index 1ffb7dc..00ab94a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -137,8 +137,7 @@ static int page_cache_tree_insert(struct address_space *mapping, } else { /* DAX can replace empty locked entry with a hole */ WARN_ON_ONCE(p != - (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | - RADIX_DAX_ENTRY_LOCK)); + dax_radix_locked_entry(0, RADIX_DAX_EMPTY)); /* DAX accounts exceptional entries as normal pages */ if (node) workingset_node_pages_dec(node); -- cgit v1.1