summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Don't reset pg_moreio in __nfs_pageio_add_requestTrond Myklebust2014-07-131-1/+1
| | | | | | | | | | | Once we've started sending unstable NFS writes, we do not want to clear pg_moreio, or we may end up sending the very last request as a stable write if the commit lists are still empty. Do, however, reset pg_moreio in the case where we end up having to recoalesce the write if an attempt to use pNFS failed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Remove 2 unused variablesTrond Myklebust2014-07-122-4/+0
| | | | | Cc: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: handle multiple reqs in nfs_wb_page_cancelWeston Andros Adamson2014-07-121-20/+21
| | | | | | | | Use nfs_lock_and_join_requests to merge all subrequests into the head request - this cancels and dereferences all subrequests. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: handle multiple reqs in nfs_page_async_flushWeston Andros Adamson2014-07-123-25/+235
| | | | | | | | | | | | | | | | Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple requests in a page. There is only one request for a page the first time nfs_page_async_flush is called, but if a write or commit fails, async_flush is called again and there may be multiple requests associated with the page. The solution is to merge all the requests in a page group into a single request before calling nfs_pageio_add_request. Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and change it to first lock all requests for the page, then cancel and merge all subrequests into the head request. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: change find_request to find_head_requestWeston Andros Adamson2014-07-121-9/+24
| | | | | | | | | nfs_page_find_request_locked* should find the head request for that page. Rename the functions and add comments to make this clear, and fix a bug that could return a subrequest when page_private isn't set on the page. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: nfs_page should take a ref on the head reqWeston Andros Adamson2014-07-121-0/+10
| | | | | | | | | | | | nfs_pages that aren't the the head of a group must take a reference on the head as long as ->wb_head is set to it. This stops the head from hitting a refcount of 0 while there is still an active nfs_page for the page group. This avoids kref warnings in the writeback code when the page group head is found and referenced. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: mark nfs_page reqs with flag for extra refWeston Andros Adamson2014-07-122-3/+9
| | | | | | | | Change the use of PG_INODE_REF - set it when taking extra reference on subrequests and take care to only release once for each request. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: only show Posix ACLs in listxattr if actually presentChristoph Hellwig2014-07-082-2/+45
| | | | | | | | | | | | | The big ACL switched nfs to use generic_listxattr, which calls all existing ->list handlers. Add a custom .listxattr implementation that only lists the ACLs if they actually are present on the given inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Philippe Troin <phil@fifi.org> Tested-by: Philippe Troin <phil@fifi.org> Fixes: 013cdf1088d7 (nfs: use generic posix ACL infrastructure ...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for supportAndy Adamson2014-06-243-45/+57
| | | | | | | | | | | | | | Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS Return -EPERM if no supported or matching SECINFO flavorAndy Adamson2014-06-241-7/+4
| | | | | | | | | | | | | | | Do not return RPC_AUTH_UNIX if SEINFO reply tests fail. This prevents an infinite loop of NFS4ERR_WRONGSEC for non RPC_AUTH_UNIX mounts. Without this patch, a mount with no sec= option to a server that does not include RPC_AUTH_UNIX in the SECINFO return can be presented with an attemtp to use RPC_AUTH_UNIX which will result in an NFS4ERR_WRONG_SEC which will prompt the SECINFO call which will again try RPC_AUTH_UNIX.... Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS check the return of nfs4_negotiate_security in nfs4_submountAndy Adamson2014-06-241-2/+5
| | | | | | Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't mark the data cache as invalid if it has been flushedTrond Myklebust2014-06-241-35/+40
| | | | | | | | | | | | Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file sizeTrond Myklebust2014-06-241-0/+1
| | | | | | | | | | | In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: Fix cache_validity check in nfs_write_pageuptodate()Scott Mayhew2014-06-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-06-124-281/+126
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
| * nfs: switch to iter_splice_write_file()Al Viro2014-06-123-34/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * nfs: switch to ->write_iter()Al Viro2014-05-063-12/+10
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * nfs: switch to ->read_iter()Al Viro2014-05-063-14/+9
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * new helper: iov_iter_get_pages_alloc()Al Viro2014-05-061-202/+88
| | | | | | | | | | | | | | | | same as iov_iter_get_pages(), except that pages array is allocated (kmalloc if possible, vmalloc if that fails) and left for caller to free. Lustre and NFS ->direct_IO() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * start adding the tag to iov_iterAl Viro2014-05-061-2/+2
| | | | | | | | | | | | | | | | | | For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * new helper: generic_file_read_iter()Al Viro2014-05-061-1/+1
| | | | | | | | | | | | | | | | | | iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * get rid of pointless iov_length() in ->direct_IO()Al Viro2014-05-061-7/+3
| | | | | | | | | | | | all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * convert the guts of nfs_direct_IO() to iov_iterAl Viro2014-05-062-31/+33
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * pass iov_iter to ->direct_IO()Al Viro2014-05-061-4/+4
| | | | | | | | | | | | unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-06-1029-1203/+1316
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: - massive cleanup of the NFS read/write code by Anna and Dros - support multiple NFS read/write requests per page in order to deal with non-page aligned pNFS striping. Also cleans up the r/wsize < page size code nicely. - stable fix for ensuring inode is declared uptodate only after all the attributes have been checked. - stable fix for a kernel Oops when remounting - NFS over RDMA client fixes - move the pNFS files layout driver into its own subdirectory" * tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits) NFS: populate ->net in mount data when remounting pnfs: fix lockup caused by pnfs_generic_pg_test NFSv4.1: Fix typo in dprintk NFSv4.1: Comment is now wrong and redundant to code NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state xprtrdma: Disconnect on registration failure xprtrdma: Remove BUG_ON() call sites xprtrdma: Avoid deadlock when credit window is reset SUNRPC: Move congestion window constants to header file xprtrdma: Reset connection timeout after successful reconnect xprtrdma: Use macros for reconnection timeout constants xprtrdma: Allocate missing pagelist xprtrdma: Remove Tavor MTU setting xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting xprtrdma: Reduce the number of hardway buffer allocations xprtrdma: Limit work done by completion handler xprtrmda: Reduce calls to ib_poll_cq() in completion handlers xprtrmda: Reduce lock contention in completion handlers xprtrdma: Split the completion queue xprtrdma: Make rpcrdma_ep_destroy() return void ...
| * | NFS: populate ->net in mount data when remountingMateusz Guzik2014-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise the kernel oopses when remounting with IPv6 server because net is dereferenced in dev_get_by_name. Use net ns of current thread so that dev_get_by_name does not operate on foreign ns. Changing the address is prohibited anyway so this should not affect anything. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: fix lockup caused by pnfs_generic_pg_testWeston Andros Adamson2014-06-101-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | end_offset and req_offset both return u64 - avoid casting to u32 until it's needed, when it's less than the (u32) size returned by nfs_generic_pg_test. Also, fix the comments in pnfs_generic_pg_test. Running the cthon04 special tests caused this lockup in the "write/read at 2GB, 4GB edges" test when running against a file layout server: BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823] Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4 irq event stamp: 205958 hardirqs last enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30 hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000 RIP: 0010:[<ffffffffa02ce51f>] [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs] RSP: 0018:ffff880078c4fab0 EFLAGS: 00000202 RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300 RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035 R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0 Stack: ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8 00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0 ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34 Call Trace: [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs] [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs] [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs] [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs] [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs] [<ffffffff8149cf9e>] ? printk+0x54/0x56 [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9 [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc] [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs] [<ffffffff810ec59c>] do_writepages+0x21/0x2f [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs] [<ffffffff81129215>] do_sync_write+0x59/0x78 [<ffffffff8112956c>] vfs_write+0xab/0x107 [<ffffffff81129c8b>] SyS_write+0x49/0x7f [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.1: Fix typo in dprintkTom Haynes2014-06-091-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.1: Comment is now wrong and redundant to codeTom Haynes2014-06-091-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The save of the write offset was removed some time ago, so that part of the comment is bogus. The remainder is pretty self-evident. So off with it! Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_stateTrond Myklebust2014-06-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The addition of lockdep code to write_seqcount_begin/end has lead to a bunch of false positive claims of ABBA deadlocks with the so_lock spinlock. Audits show that this simply cannot happen because the read side code does not spin while holding so_lock. Cc: <stable@vger.kernel.org> # 3.13.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | Push the file layout driver into a subdirectoryTom Haynes2014-05-295-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object and block layouts already exist in their own subdirectories. This patch completes the set! Note that as a layout denotes nfs4 already, I stripped that prefix out of the file names. Signed-off-by: Tom Haynes <Thomas.Haynes@primarydata.com> Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Handle allocation errors correctly in objlayout_alloc_layout_hdr()Trond Myklebust2014-05-291-4/+4
| | | | | | | | | | | | | | | | | | | | | Return the NULL pointer when the allocation fails. Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pNFS: Handle allocation errors correctly in filelayout_alloc_layout_hdr()Trond Myklebust2014-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Return the NULL pointer when the allocation fails. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: <stable@vger.kernel.org> # 3.5.x Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data()Scott Mayhew2014-05-291-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | Those flags are obsolete and checking them can incorrectly cause remount operations to fail. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4: Use error handler on failed GETATTR with successful OPENAndy Adamson2014-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Place the call to resend the failed GETATTR under the error handler so that when appropriate, the GETATTR is retried more than once. The server can fail the GETATTR op in the OPEN compound with a recoverable error such as NFS4ERR_DELAY. In the case of an O_EXCL open, the server has created the file, so a retrans of the OPEN call will fail with NFS4ERR_EXIST. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix a potential busy wait in nfs_page_group_lockTrond Myklebust2014-05-291-10/+9
| | | | | | | | | | | | | | | | | | | | | We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since the loop would cause a busy wait if somebody kills the task. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFS: Fix error handling in __nfs_pageio_add_requestTrond Myklebust2014-05-291-0/+6
| | | | | | | | | | | | | | | | | | | | | Handle the case where nfs_create_request() returns an error. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: support page groups in nfs_read_completionWeston Andros Adamson2014-05-291-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_read_completion relied on the fact that there was a 1:1 mapping of page to nfs_request, but this has now changed. Regions not covered by a request have already been zeroed elsewhere. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: filelayout: support non page aligned layoutsWeston Andros Adamson2014-05-291-32/+20
| | | | | | | | | | | | | | | | | | | | | | | | Use the new pg_test interface to adjust requests to fit in the current stripe / segment. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: allow non page aligned pnfs layout segmentsWeston Andros Adamson2014-05-291-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Remove alignment checks that would revert to MDS and change pg_test to return the max ammount left in the segment (or other pg_test call) up to size of passed request, or 0 if no space is left. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: support multiple verfs per direct reqWeston Andros Adamson2014-05-292-5/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | Support direct requests that span multiple pnfs data servers by comparing nfs_pgio_header->verf to a cached verf in pnfs_commit_bucket. Continue to use dreq->verf if the MDS is used / non-pNFS. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: remove data list from pgio headerWeston Andros Adamson2014-05-292-59/+21
| | | | | | | | | | | | | | | | | | | | | | | | Since the ability to split pages into subpage requests has been added, nfs_pgio_header->rpc_list only ever has one pgio data. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: use > 1 request to handle bsize < PAGE_SIZEWeston Andros Adamson2014-05-291-69/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the newly added support for multiple requests per page for rsize/wsize < PAGE_SIZE, instead of having multiple read / write data structures per pageio header. This allows us to get rid of nfs_pgio_multi. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: chain calls to pg_testWeston Andros Adamson2014-05-293-21/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that pg_test can change the size of the request (by returning a non-zero size smaller than the request), pg_test functions that call other pg_test functions must return the minimum of the result - or 0 if any fail. Also clean up the logic of some pg_test functions so that all checks are for contitions where coalescing is not possible. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: allow coalescing of subpage requestsWeston Andros Adamson2014-05-291-4/+0
| | | | | | | | | | | | | | | | | | | | | Remove check that the request covers a whole page. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | pnfs: clean up filelayout_alloc_commit_infoWeston Andros Adamson2014-05-291-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unneeded else statement and clean up how commit info dataserver buckets are replaced. Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: page group support in nfs_mark_uptodateWeston Andros Adamson2014-05-291-7/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change how nfs_mark_uptodate checks to see if writes cover a whole page. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: page group syncing in write pathWeston Andros Adamson2014-05-292-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Operations that modify state for a whole page must be syncronized across all requests within a page group. In the write path, this is calling end_page_writeback and removing the head request from an inode. Both of these operations should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: page group syncing in read pathWeston Andros Adamson2014-05-292-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Operations that modify state for a whole page must be syncronized across all requests within a page group. In the read path, this is calling unlock_page and SetPageUptodate. Both of these functions should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: add support for multiple nfs reqs per pageWeston Andros Adamson2014-05-294-19/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add "page groups" - a circular list of nfs requests (struct nfs_page) that all reference the same page. This gives nfs read and write paths the ability to account for sub-page regions independently. This somewhat follows the design of struct buffer_head's sub-page accounting. Only "head" requests are ever added/removed from the inode list in the buffered write path. "head" and "sub" requests are treated the same through the read path and the rest of the write/commit path. Requests are given an extra reference across the life of the list. Page groups are never rejoined after being split. If the read/write request fails and the client falls back to another path (ie revert to MDS in PNFS case), the already split requests are pushed through the recoalescing code again, which may split them further and then coalesce them into properly sized requests on the wire. Fragmentation shouldn't be a problem with the current design, because we flush all requests in page group when a non-contiguous request is added, so the only time resplitting should occur is on a resend of a read or write. This patch lays the groundwork for sub-page splitting, but does not actually do any splitting. For now all page groups have one request as pg_test functions don't yet split pages. There are several related patches that are needed support multiple requests per page group. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud