summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'acl_fixes' into linux-nextTrond Myklebust2014-02-031-22/+12
|\
| * NFSv3: Fix return value of nfs3_proc_setaclsTrond Myklebust2014-02-031-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | nfs3_proc_setacls is used internally by the NFSv3 create operations to set the acl after the file has been created. If the operation fails because the server doesn't support acls, then it must return '0', not -EOPNOTSUPP. Reported-by: Russell King <linux@arm.linux.org.uk> Link: http://lkml.kernel.org/r/20140201010328.GI15937@n2100.arm.linux.org.uk Cc: Christoph Hellwig <hch@lst.de> Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv3: Remove unused function nfs3_proc_set_default_aclTrond Myklebust2014-02-031-19/+0
| | | | | | | | | | Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs: fix setting of ACLs on file creation.Noah Massey2014-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs3_get_acl() tries to skip posix equivalent ACLs, but misinterprets the return value of posix_acl_equiv_mode(). Fix it. This is a regression introduced by "nfs: use generic posix ACL infrastructure for v3 Posix ACLs" CC: Christoph Hellwig <hch@infradead.org> CC: linux-nfs@vger.kernel.org CC: linux-fsdevel@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | NFSv4.1: nfs4_destroy_session must call rpc_destroy_waitqueueTrond Myklebust2014-02-013-7/+22
| | | | | | | | | | | | | | | | There may still be timers active on the session waitqueues. Make sure that we kill them before freeing the memory. Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | NFSv4: Fix memory corruption in nfs4_proc_open_confirmTrond Myklebust2014-02-011-4/+4
|/ | | | | | | | | | | nfs41_wake_and_assign_slot() relies on the task->tk_msg.rpc_argp and task->tk_msg.rpc_resp always pointing to the session sequence arguments. nfs4_proc_open_confirm tries to pull a fast one by reusing the open sequence structure, thus causing corruption of the NFSv4 slot table. Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-01-319-35/+84
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights: - Fix several races in nfs_revalidate_mapping - NFSv4.1 slot leakage in the pNFS files driver - Stable fix for a slot leak in nfs40_sequence_done - Don't reject NFSv4 servers that support ACLs with only ALLOW aces" * tag 'nfs-for-3.14-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: initialize the ACL support bits to zero. NFSv4.1: Cleanup NFSv4.1: Clean up nfs41_sequence_done NFSv4: Fix a slot leak in nfs40_sequence_done NFSv4.1 free slot before resending I/O to MDS nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATING NFS: Fix races in nfs_revalidate_mapping sunrpc: turn warn_gssd() log message into a dprintk() NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping nfs: handle servers that support only ALLOW ACE type.
| * nfs: initialize the ACL support bits to zero.Malahal Naineni2014-01-311-1/+1
| | | | | | | | | | | | | | | | Avoid returning incorrect acl mask attributes when the server doesn't support ACLs. Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1: CleanupTrond Myklebust2014-01-291-4/+2
| | | | | | | | | | | | | | It is now completely safe to call nfs41_sequence_free_slot with a NULL slot. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1: Clean up nfs41_sequence_doneTrond Myklebust2014-01-291-11/+8
| | | | | | | | | | | | | | Move the test for res->sr_slot == NULL out of the nfs41_sequence_free_slot helper and into the main function for efficiency. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4: Fix a slot leak in nfs40_sequence_doneTrond Myklebust2014-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The check for whether or not we sent an RPC call in nfs40_sequence_done is insufficient to decide whether or not we are holding a session slot, and thus should not be used to decide when to free that slot. This patch replaces the RPC_WAS_SENT() test with the correct test for whether or not slot == NULL. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1 free slot before resending I/O to MDSAndy Adamson2014-01-293-3/+11
| | | | | | | | | | | | | | | | Fix a dynamic session slot leak where a slot is preallocated and I/O is resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs: add memory barriers around NFS_INO_INVALID_DATA and NFS_INO_INVALIDATINGJeff Layton2014-01-283-3/+13
| | | | | | | | | | | | | | | | | | | | | | If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Fix races in nfs_revalidate_mappingTrond Myklebust2014-01-281-14/+14
| | | | | | | | | | | | | | | | | | | | Commit d529ef83c355f97027ff85298a9709fe06216a66 (NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces a potential race, since it doesn't test the value of nfsi->cache_validity and set the bitlock in nfsi->flags atomically. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jeff Layton <jlayton@redhat.com>
| * sunrpc: turn warn_gssd() log message into a dprintk()Jeff Layton2014-01-271-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original printk() made sense when the GSSAPI codepaths were called only when sec=krb5* was explicitly requested. Now however, in many cases the nfs client will try to acquire GSSAPI credentials by default, even when it's not requested. Since we don't have a great mechanism to distinguish between the two cases, just turn the pr_warn into a dprintk instead. With this change we can also get rid of the ratelimiting. We do need to keep the EXPORT_SYMBOL(gssd_running) in place since auth_gss.ko needs it and sunrpc.ko provides it. We can however, eliminate the gssd_running call in the nfs code since that's a bit of a layering violation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mappingJeff Layton2014-01-274-6/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do that to look for the new NFS_INO_INVALIDATING flag. Doing this requires us to be careful not to set the bitlock unnecessarily, so this code only does that if it believes it will be doing an invalidation. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs: handle servers that support only ALLOW ACE type.Malahal Naineni2014-01-271-4/+3
| | | | | | | | | | | | | | | | | | Currently we support ACLs if the NFS server file system supports both ALLOW and DENY ACE types. This patch makes the Linux client work with ACLs even if the server supports only 'ALLOW' ACE type. Signed-off-by: Malahal Naineni <malahal@us.ibm.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-01-301-25/+18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
| * \ Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe2013-12-317-9/+51
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
| * | | block: Abstract out bvec iteratorKent Overstreet2013-11-231-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
| * | | block: Convert various code to bio_for_each_segment()Kent Overstreet2013-11-231-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | nfs: fix xattr inode op pointers when disabledChristoph Hellwig2014-01-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Mason reported a NULL pointer derefernence in generic_getxattr() that was due to sb->s_xattr being NULL. The reason is that the nfs #ifdef's for ACL support were misplaced, and the nfs3 inode operations had the xattr operation pointers set up, even though xattrs were not actually supported. As a result, the xattr code was being called without the infrastructure having been set up. Move the #ifdef's appropriately. Reported-and-tested-by: Chris Mason <clm@fb.com> Acked-by: Al Viro viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-01-2815-257/+359
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: - stable fix for an infinite loop in RPC state machine - stable fix for a use after free situation in the NFSv4 trunking discovery - stable fix for error handling in the NFSv4 trunking discovery - stable fix for the page write update code - stable fix for the NFSv4.1 mount time security negotiation - stable fix for the NFSv4 open code. - O_DIRECT locking fixes - fix an Oops in the pnfs file commit code - RPC layer needs finer grained handling of connection errors - more RPC GSS upcall fixes" * tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits) pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done pnfs: fix BUG in filelayout_recover_commit_reqs nfs4: fix discover_server_trunking use after free NFSv4.1: Handle errors correctly in nfs41_walk_client_list nfs: always make sure page is up-to-date before extending a write to cover the entire page nfs: page cache invalidation for dio nfs: take i_mutex during direct I/O reads nfs: merge nfs_direct_write into nfs_file_direct_write nfs: merge nfs_direct_read into nfs_file_direct_read nfs: increment i_dio_count for reads, too nfs: defer inode_dio_done call until size update is done nfs: fix size updates for aio writes nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME NFSv4.1: Fix a race in nfs4_write_inode NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding point to the right include file in a comment (left over from a9004abc3) NFS: dprintk() should not print negative fileids and inode numbers nfs: fix dead code of ipv6_addr_scope sunrpc: Fix infinite loop in RPC state machine SUNRPC: Add tracepoint for socket errors ...
| * | | pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_doneBoaz Harrosh2014-01-221-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An NFS4ERR_RECALLCONFLICT is returned by server from a GET_LAYOUT only when a Server Sent a RECALL do to that GET_LAYOUT, or the RECALL and GET_LAYOUT crossed on the wire. In any way this means we want to wait at most until in-flight IO is finished and the RECALL can be satisfied. So a proper wait here is more like 1/10 of a second, not 15 seconds like we have now. In case of a server bug we delay exponentially longer on each retry. Current code totally craps out performance of very large files on most pnfs-objects layouts, because of how the map changes when the file has grown into the next raid group. [Stable: This will patch back to 3.9. If there are earlier still maintained trees, please tell me I'll send a patch] CC: Stable Tree <stable@vger.kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | pnfs: fix BUG in filelayout_recover_commit_reqsWeston Andros Adamson2014-01-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cond_resched_lock(cinfo->lock) is called everywhere else while holding the cinfo->lock spinlock. Not holding this lock while calling transfer_commit_list in filelayout_recover_commit_reqs causes the BUG below. It's true that we can't hold this lock while calling pnfs_put_lseg, because that might try to lock the inode lock - which might be the same lock as cinfo->lock. To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command that crosses a stripe boundary and is not page aligned, such as: dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161 in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1 2 locks held by kworker/0:1/27: #0: (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 #1: ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: events nfs_direct_write_schedule_work [nfs] 0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130 ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40 ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000 Call Trace: [<ffffffff81491256>] dump_stack+0x4d/0x66 [<ffffffff8105f103>] __might_sleep+0x100/0x105 [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files] [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files] [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs] [<ffffffff810705df>] ? mark_lock+0x1df/0x224 [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4 [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs] [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff81050258>] process_one_work+0x1f6/0x3a5 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff8105187e>] worker_thread+0x149/0x1f5 [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d [<ffffffff81056d74>] kthread+0xd2/0xda [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs4: fix discover_server_trunking use after freeWeston Andros Adamson2014-01-201-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If clp is new (cl_count = 1) and it matches another client in nfs4_discover_server_trunking, the nfs_put_client will free clp before ->cl_preserve_clid is set. Cc: stable@vger.kernel.org # 3.7+ Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4.1: Handle errors correctly in nfs41_walk_client_listTrond Myklebust2014-01-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both nfs41_walk_client_list and nfs40_walk_client_list expect the 'status' variable to be set to the value -NFS4ERR_STALE_CLIENTID if the loop fails to find a match. The problem is that the 'pos->cl_cons_state > NFS_CS_READY' changes the value of 'status', and sets it either to the value '0' (which indicates success), or to the value EINTR. Cc: stable@vger.kernel.org # 3.7.x: 7b1f1fd1842e6: NFSv4/4.1: Fix bugs in Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: always make sure page is up-to-date before extending a write to cover ↵Scott Mayhew2014-01-171-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the entire page We should always make sure the cached page is up-to-date when we're determining whether we can extend a write to cover the full page -- even if we've received a write delegation from the server. Commit c7559663 added logic to skip this check if we have a write delegation, which can lead to data corruption such as the following scenario if client B receives a write delegation from the NFS server: Client A: # echo 123456789 > /mnt/file Client B: # echo abcdefghi >> /mnt/file # cat /mnt/file 0�D0�abcdefghi Just because we hold a write delegation doesn't mean that we've read in the entire page contents. Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: page cache invalidation for dioChristoph Hellwig2014-01-131-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to properly invalidate the pagecache before performing direct I/O, so that no stale pages are left around. This matches what the generic direct I/O code does. Also take the i_mutex over the direct write submission to avoid the lifelock vs truncate waiting for i_dio_count to decrease, and to avoid having the pagecache easily repopulated while direct I/O is in progrss. Again matching the generic direct I/O code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: take i_mutex during direct I/O readsChristoph Hellwig2014-01-131-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need the i_mutex to prevent i_dio_count from incrementing while truncate is waiting for it to reach zero, and protects against having the pagecache repopulated after we flushed it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: merge nfs_direct_write into nfs_file_direct_writeChristoph Hellwig2014-01-131-50/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Simple code cleanup to prepare for later fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: merge nfs_direct_read into nfs_file_direct_readChristoph Hellwig2014-01-131-58/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Simple code cleanup to prepare for later fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: increment i_dio_count for reads, tooChristoph Hellwig2014-01-131-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i_dio_count is used to protect dio access against truncate. We want to make sure there are no dio reads pending either when doing a truncate. I suspect on plain NFS things might work even without this, but once we use a pnfs layout driver that access backing devices directly things will go bad without the proper synchronization. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: defer inode_dio_done call until size update is doneChristoph Hellwig2014-01-131-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to have the I/O fully finished before telling the truncate code that we are done. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: fix size updates for aio writesChristoph Hellwig2014-01-131-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_file_direct_write only updates the inode size if it succeeded and returned the number of bytes written. But in the AIO case nfs_direct_wait turns the return value into -EIOCBQUEUED and we skip the size update. Instead the aio completion path should updated it, which this patch does. The implementation is a little hacky because there is no obvious way to find out we are called for a write in nfs_direct_complete. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAMEWeston Andros Adamson2014-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # 3.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4.1: Fix a race in nfs4_write_inodeTrond Myklebust2014-01-132-43/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs4_write_inode() must not be allowed to exit until the layoutcommit is done. That means that both NFS_INO_LAYOUTCOMMIT and NFS_INO_LAYOUTCOMMITTING have to be cleared. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstandingTrond Myklebust2014-01-133-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a LAYOUTCOMMIT is outstanding, then chances are that the metadata server may still be returning incorrect values for the change attribute, ctime, mtime and/or size. Just ignore those attributes for now, and wait for the LAYOUTCOMMIT rpc call to finish. Reported-by: shaobingqing <shaobingqing@bwstor.com.cn> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | point to the right include file in a comment (left over from a9004abc3)Toralf Förster2014-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | NFS: dprintk() should not print negative fileids and inode numbersNiels de Vos2014-01-058-44/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A fileid in NFS is a uint64. There are some occurrences where dprintk() outputs a signed fileid. This leads to confusion and more difficult to read debugging (negative fileids matching positive inode numbers). Signed-off-by: Niels de Vos <ndevos@redhat.com> CC: Santosh Pradhan <spradhan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: fix dead code of ipv6_addr_scopeAlexander Aring2014-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The correct way to check on IPV6_ADDR_SCOPE_LINKLOCAL is to check with the ipv6_addr_src_scope function. Currently this can't be work, because ipv6_addr_scope returns a int with a mask of IPV6_ADDR_SCOPE_MASK (0x00f0U) and IPV6_ADDR_SCOPE_LINKLOCAL is 0x02. So the condition is always false. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | nfs: check if gssd is running before attempting to use krb5i auth in ↵Jeff Layton2013-12-061-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SETCLIENTID call Currently, the client will attempt to use krb5i in the SETCLIENTID call even if rpc.gssd isn't running. When that fails, it'll then fall back to RPC_AUTH_UNIX. This introduced a delay when mounting if rpc.gssd isn't running, and causes warning messages to pop up in the ring buffer. Check to see if rpc.gssd is running before even attempting to use krb5i auth, and just silently skip trying to do so if it isn't. In the event that the admin is actually trying to mount with krb5*, it will still fail at a later stage of the mount attempt. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | NFSv4: OPEN must handle the NFS4ERR_IO return code correctlyTrond Myklebust2013-12-061-16/+31
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | decode_op_hdr() cannot distinguish between an XDR decoding error and the perfectly valid errorcode NFS4ERR_IO. This is normally not a problem, but for the particular case of OPEN, we need to be able to increment the NFSv4 open sequence id when the server returns a valid response. Reported-by: J Bruce Fields <bfields@fieldses.org> Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org
* | | nfs: use generic posix ACL infrastructure for v3 Posix ACLsChristoph Hellwig2014-01-264-264/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | This causes a small behaviour change in that we don't bother to set ACLs on file creation if the mode bit can express the access permissions fully, and thus behaving identical to local filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | fs: make posix_acl_create more usefulChristoph Hellwig2014-01-251-1/+1
|/ / | | | | | | | | | | | | | | | | | | Rename the current posix_acl_created to __posix_acl_create and add a fully featured helper to set up the ACLs on file creation that uses get_acl(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-12-057-9/+51
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: - Stable fix for a NFSv4.1 delegation and state recovery deadlock - Stable fix for a loop on irrecoverable errors when returning delegations - Fix a 3-way deadlock between layoutreturn, open, and state recovery - Update the MAINTAINERS file with contact information for Trond Myklebust - Close needs to handle NFS4ERR_ADMIN_REVOKED - Enabling v4.2 should not recompile nfsd and lockd - Fix a couple of compile warnings * tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix do_div() warning by instead using sector_div() MAINTAINERS: Update contact information for Trond Myklebust NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery SUNRPC: do not fail gss proc NULL calls with EACCES NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED NFSv4: Update list of irrecoverable errors on DELEGRETURN NFSv4 wait on recovery for async session errors NFS: Fix a warning in nfs_setsecurity NFS: Enabling v4.2 should not recompile nfsd and lockd
| * nfs: fix do_div() warning by instead using sector_div()Helge Deller2013-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like shown below. Fix this warning by instead using sector_div() which is provided by the kernel.h header file. fs/nfs/blocklayout/extents.c: In function ‘normalize’: include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’ nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default] fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default] include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recoveryTrond Myklebust2013-12-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy Adamson reports: The state manager is recovering expired state and recovery OPENs are being processed. If kswapd is pruning inodes at the same time, a deadlock can occur when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the resultant layoutreturn gets an error that the state mangager is to handle, causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. At the same time an open is waiting for the inode deletion to complete in __wait_on_freeing_inode. If the open is either the open called by the state manager, or an open from the same open owner that is holding the NFSv4 sequence id which causes the OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, then the state is deadlocked with kswapd. The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY. We already know that layouts are dropped on all server reboots, and that it has to be coded to deal with the "forgetful client model" that doesn't send layoutreturns. Reported-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
| * NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKEDTrond Myklebust2013-11-201-3/+6
| | | | | | | | | | | | Also ensure that we zero out the stateid mode when exiting Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: Update list of irrecoverable errors on DELEGRETURNTrond Myklebust2013-11-201-2/+8
| | | | | | | | | | | | | | | | | | | | | | If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID then there is no recovery possible. Just quit without returning an error. Also, note that the client must not assume that the NFSv4 lease has been renewed when it sees an error on DELEGRETURN. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
OpenPOWER on IntegriCloud