summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* NFSv3: Ensure that do_proc_get_root() reports errors correctlyTrond Myklebust2012-08-201-1/+1
| | | | | | | | | | | If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Ensure that nfs4_alloc_client cleans up on error.Trond Myklebust2012-08-201-1/+1
| | | | | | | | Any pointer that was allocated through nfs_alloc_client() needs to be freed via a call to nfs_free_client(). Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: return -ENOKEY when the upcall fails to map the nameBryan Schumaker2012-08-161-4/+2
| | | | | | | | | | This allows the normal error-paths to handle the error, rather than making a special call to complete_request_key() just for this instance. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Cc: stable@vger.kernel.org [>= 3.4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Clear key construction data if the idmap upcall failsBryan Schumaker2012-08-161-14/+42
| | | | | | | | | | | | | idmap_pipe_downcall already clears this field if the upcall succeeds, but if it fails (rpc.idmapd isn't running) the field will still be set on the next call triggering a BUG_ON(). This patch tries to handle all possible ways that the upcall could fail and clear the idmap key data for each one. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Tested-by: William Dauchy <wdauchy@gmail.com> Cc: stable@vger.kernel.org [>= 3.4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Don't use private xdr_stream fields in decode_getaclTrond Myklebust2012-08-161-5/+6
| | | | | | | Instead of using the private field xdr->p from struct xdr_stream, use the public xdr_stream_pos(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Fix the acl cache size calculationTrond Myklebust2012-08-161-2/+3
| | | | | | | | | | | | Currently, we do not take into account the size of the 16 byte struct nfs4_cached_acl header, when deciding whether or not we should cache the acl data. Consequently, we will end up allocating an 8k buffer in order to fit a maximum size 4k acl. This patch adjusts the calculation so that we limit the cache size to 4k for the acl header+data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Fix pointer arithmetic in decode_getaclTrond Myklebust2012-08-162-15/+8
| | | | | | | | | | | | | | | Resetting the cursor xdr->p to a previous value is not a safe practice: if the xdr_stream has crossed out of the initial iovec, then a bunch of other fields would need to be reset too. Fix this issue by using xdr_enter_page() so that the buffer gets page aligned at the bitmap _before_ we decode it. Also fix the confusion of the ACL length with the page buffer length by not adding the base offset to the ACL length... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* NFS: Alias the nfs module to nfs4bjschuma@gmail.com2012-08-161-0/+2
| | | | | | | | | This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix a regression when loading the NFS v4 modulebjschuma@gmail.com2012-08-165-26/+49
| | | | | | | | | | | | | | Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias pointing to nfs.ko, rather than nfs4.ko. This can prevent the v4 module from loading on mount, since the kernel sees that something named "nfs4" has already been loaded. To work around this, I've renamed the modules to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko". I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount -t nfs4` still works. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_doneTrond Myklebust2012-08-081-6/+2
| | | | | | | | | | | | | | | | | | | Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>=3.5]
* pnfs-obj: Better IO pattern in case of unaligned offsetBoaz Harrosh2012-08-021-3/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on layout and ARCH, ORE has some limits on max IO sizes which is communicated on (what else) ore_layout->max_io_length, which is always stripe aligned. This was considered as the pg_test boundary for splitting and starting a new IO. But in the case of a long IO where the start offset is not aligned what would happen is that both end of IO[N] and start of IO[N+1] would be unaligned, causing each IO boundary parity unit to be calculated and written twice. So what we do in this patch is split the very start of an unaligned IO, up to a stripe boundary, and then next IO's can continue fully aligned til the end. We might be sacrificing the case where the full unaligned IO would fit within a single max_io_length, but the sacrifice is well worth the elimination of double calculation and parity units IO. Actually the sacrificing is marginal and is almost unmeasurable. TODO: If we know the total expected linear segment that will be received, at pg_init, we could use that information in many places: 1. blocks-layout get_layout write segment size 2. Better mds-threshold 3. In above situation for a better clean split I will do this in future submission. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS41: add pg_layout_private to nfs_pageio_descriptorPeng Tao2012-08-021-0/+2
| | | | | | | | | To allow layout driver to pass private information around pg_init/pg_doio. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* pnfs: nfs4_proc_layoutget returns voidIdan Kedar2012-08-022-5/+5
| | | | | | | | | since the only user of nfs4_proc_layoutget is send_layoutget, which ignores its return value, there is no reason to return any value. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* pnfs: defer release of pages in layoutgetIdan Kedar2012-08-023-40/+58
| | | | | | | | | | | | | | | | | | | we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs: tear down caches in nfs_init_writepagecache when allocation failsJeff Layton2012-08-021-3/+12
| | | | | | | | | | ...and ensure that we tear down the nfs_commit_data cache too when unloading the module. Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds2012-07-318-75/+153
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...
| * nfs: prevent page allocator recursions with swap over NFS.Mel Gorman2012-07-312-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO, just not of any filesystem data. The problem is that previously NOFS was correct because that avoids recursion into the NFS code. With swap-over-NFS, it is no longer correct as swap IO can lead to this recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * nfs: enable swap on NFSMel Gorman2012-07-313-30/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * nfs: disable data cache revalidation for swapfilesMel Gorman2012-07-312-14/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * nfs: teach the NFS client how to treat PG_swapcache pagesMel Gorman2012-07-315-28/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-07-3132-390/+690
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO
| * | nfs: explicitly reject LOCK_MAND flock() requestsJeff Layton2012-07-311-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly return -EINVAL if someone requests it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | nfs: increase number of permitted callback connections.NeilBrown2012-07-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default a sunrpc service is limited to (N+3)*20 connections where N is the number of threads. This is 80 when N==1. If this number is exceeded a warning is printed suggesting that the number of threads be increased. However with services which run a single thread, this is impossible. For such services there is a ->sv_maxconn setting that can be used to forcibly increase the limit, and silence the message. This is used by lockd. The nfs client uses a sunrpc service to handle callbacks and it too is single-threaded, so to avoid the useless messages, and to allow a reasonable number of concurrent connections, we need to set ->sv_maxconn. 1024 seems like a good number. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Convert v4 into a moduleBryan Schumaker2012-07-3021-108/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Convert v3 into a moduleBryan Schumaker2012-07-3011-33/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch exports symbols and moves over the final structures needed by the v3 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set. The module (nfs3.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Convert v2 into a moduleBryan Schumaker2012-07-3012-24/+50
| | | | | | | | | | | | | | | | | | | | | | | | The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Keep module parameters in the generic NFS clientBryan Schumaker2012-07-307-49/+48
| | | | | | | | | | | | | | | Otherwise we break backwards compatibility when v4 becomes a modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Split out remaining NFS v4 inode functionsBryan Schumaker2012-07-305-48/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Somehow I missed this in my previous patch series, but these functions are only needed by the v4 code and should be moved to a v4-only file. I wasn't exactly sure where I should put these functions, so I moved them into nfs4super.c where I could make them static. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Pass super operations and xattr handlers in the nfs_subversionBryan Schumaker2012-07-306-25/+13
| | | | | | | | | | | | | | | | | | | | | | | | I can set all variables in the nfs_fill_super() function, allowing me to remove the nfs4_fill_super() function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Only initialize the ACL client in the v3 caseBryan Schumaker2012-07-3010-69/+96
| | | | | | | | | | | | | | | | | | | | | | | | v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to initialize the ACL client only when we are using v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Create a try_mount rpc opBryan Schumaker2012-07-307-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | I'm already looking up the nfs subversion in nfs_fs_mount(), so I have easy access to rpc_ops that used to be difficult to reach. This allows me to set up a different mount path for NFS v2/3 and NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Remove the NFS v4 xdev mount functionBryan Schumaker2012-07-303-58/+10
| | | | | | | | | | | | | | | | | | | | | | | | I can now share this code with the v2 and v3 code by using the NFS subversion structure. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Add version registering frameworkBryan Schumaker2012-07-3011-61/+282
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds in the code to track multiple versions of the NFS protocol. I created default structures for v2, v3 and v4 so that each version can continue to work while I convert them into kernel modules. I also removed the const parameter from the rpc_version array so that I can change it at runtime. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Fix a number of bugs in the idmapperDavid Howells2012-07-301-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a number of bugs in the NFS idmapper code: (1) Only registered key types can be passed to the core keys code, so register the legacy idmapper key type. This is a requirement because the unregister function cleans up keys belonging to that key type so that there aren't dangling pointers to the module left behind - including the key->type pointer. (2) Rename the legacy key type. You can't have two key types with the same name, and (1) would otherwise require that. (3) complete_request_key() must be called in the error path of nfs_idmap_legacy_upcall(). (4) There is one idmap struct for each nfs_client struct. This means that idmap->idmap_key_cons is shared without the use of a lock. This is a problem because key_instantiate_and_link() - as called indirectly by idmap_pipe_downcall() - releases anyone waiting for the key to be instantiated. What happens is that idmap_pipe_downcall() running in the rpc.idmapd thread, releases the NFS filesystem in whatever thread that is running in to continue. This may then make another idmapper call, overwriting idmap_key_cons before idmap_pipe_downcall() gets the chance to call complete_request_key(). I *think* that reading idmap_key_cons only once, before key_instantiate_and_link() is called, and then caching the result in a variable is sufficient. Bug (4) is the cause of: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP CPU 1 Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246 RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80 RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50 RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0 R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80 R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006 FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710) Stack: ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8 Call Trace: [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc] [<ffffffff8117f833>] vfs_write+0xb3/0x180 [<ffffffff8117fb5a>] sys_write+0x4a/0x90 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8801a3645d40> CR2: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.4]
| * | nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton2012-07-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
| * | pnfsblock: bail out partial page IOPeng Tao2012-07-301-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current block layout driver read/write code assumes page aligned IO in many places. Add a checker to validate the assumption. Otherwise there would be data corruption like when application does open(O_WRONLY) and page unaliged write. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2012-07-311-2/+2
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...
| * | SUNRPC: service request network namespace helper introducedStanislav Kinsbursky2012-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | This is a cleanup patch - makes code looks simplier. It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-07-3033-1746/+2054
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...
| * | nfs: fix fl_type tests in NFSv4 codeJeff Layton2012-07-302-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | fl_type is not a bitmap. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: fix pnfs regression with directio writesFred Isaman2012-07-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 57208fa7e51 "NFS: Create an write_pageio_init() function" did not modify the calls in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: fix pnfs regression with directio readsFred Isaman2012-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1abb50886af "NFS: Create an read_pageio_init() function" did not modify the call in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | nfs: fix stub return type warningsRandy Dunlap2012-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix numerous repeated warnings by making the stub function void instead of non-void: fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl': fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: exit_nfs_v4() shouldn't be an __exit functionBryan Schumaker2012-07-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | ... yet. Right now, init_nfs() is calling this function if an error is encountered when loading the nfs module. An __exit function can't be called from one declared as __init. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Split out NFS v4 client functionsBryan Schumaker2012-07-172-93/+91
| | | | | | | | | | | | | | | | | | | | | | | | These functions are only needed by NFS v4, so they can be moved into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Split out the NFS v4 filesystem typesBryan Schumaker2012-07-174-373/+381
| | | | | | | | | | | | | | | | | | | | | | | | | | | This allows me to move the v4 mounting and unmounting functions out of the generic client and into a file that is only compiled when CONFIG_NFS_V4 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Create a single nfs_clone_super() functionBryan Schumaker2012-07-171-26/+6
| | | | | | | | | | | | | | | | | | | | | | | | v2 and v3 shared a function for this, but v4 implemented something only slightly different. Might as well share code whenever possible... Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Split out NFS v4 server creating codeBryan Schumaker2012-07-173-448/+462
| | | | | | | | | | | | | | | | | | | | | | | | These functions are specific to NFS v4 and can be moved to nfs4client.c to keep them out of the generic client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Initialize the NFS v4 client from init_nfs_v4()Bryan Schumaker2012-07-174-135/+149
| | | | | | | | | | | | | | | | | | | | | | | | And split these functions out of the generic client into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Move the v4 getroot code to nfs4getroot.cBryan Schumaker2012-07-173-51/+50
| | | | | | | | | | | | | | | Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
OpenPOWER on IntegriCloud