summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-10-1038-681/+1922
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Features include: - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1 Aside from the issues discussed at the LKS, distros are shipping NFSv4.1 with all the trimmings. - Fix fdatasync()/fsync() for the corner case of a server reboot. - NFSv4 OPEN access fix: finally distinguish correctly between open-for-read and open-for-execute permissions in all situations. - Ensure that the TCP socket is closed when we're in CLOSE_WAIT - More idmapper bugfixes - Lots of pNFS bugfixes and cleanups to remove unnecessary state and make the code easier to read. - In cases where a pNFS read or write fails, allow the client to resume trying layoutgets after two minutes of read/write- through-mds. - More net namespace fixes to the NFSv4 callback code. - More net namespace fixes to the NFSv3 locking code. - More NFSv4 migration preparatory patches. Including patches to detect network trunking in both NFSv4 and NFSv4.1 - pNFS block updates to optimise LAYOUTGET calls." * tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits) pnfsblock: cleanup nfs4_blkdev_get NFS41: send real read size in layoutget NFS41: send real write size in layoutget NFS: track direct IO left bytes NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked() NFSv4.1: Ensure that the layout sequence id stays 'close' to the current NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code NFSv4 set open access operation call flag in nfs4_init_opendata_res NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL NFSv4 reduce attribute requests for open reclaim NFSv4: nfs4_open_done first must check that GETATTR decoded a file type NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid NFSv4.1: Deal with wraparound issues when updating the layout stateid NFSv4.1: Always set the layout stateid if this is the first layoutget NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout NFSv4: don't put ACCESS in OPEN compound if O_EXCL NFSv4: don't check MAY_WRITE access bit in OPEN NFS: Set key construction data for the legacy upcall NFSv4.1: don't do two EXCHANGE_IDs on mount NFS: nfs41_walk_client_list(): re-lock before iterating ...
| * pnfsblock: cleanup nfs4_blkdev_getPeng Tao2012-10-082-21/+5
| | | | | | | | | | | | | | | | It is not needed at all and it is messing with return values... Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS41: send real read size in layoutgetPeng Tao2012-10-081-1/+9
| | | | | | | | | | | | | | | | | | For buffer read, use offst-to-isize. For direct read, use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS41: send real write size in layoutgetPeng Tao2012-10-086-7/+57
| | | | | | | | | | | | | | | | | | | | | | For buffer write, block layout client scan inode mapping to find next hole and use offset-to-hole as layoutget length. Object layout client uses offset-to-isize as layoutget length. For direct write, both block layout and object layout use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: track direct IO left bytesPeng Tao2012-10-081-0/+5
| | | | | | | | | | Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()Trond Myklebust2012-10-051-11/+14
| | | | | | | | | | | | | | Split it into two functions, one which checks if layoutgets are blocked, and one which checks if the layout stateid has expired. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Ensure that the layout sequence id stays 'close' to the currentTrond Myklebust2012-10-041-13/+8
| | | | | | | | | | | | | | | | | | | | Clamp the layout barrier sequence id to the current sequence id minus the maximum number of outstanding layoutget requests. Also ensure that we correctly initialise lo->plh_barrier if there are no layout segments associated to this layout header. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close codeTrond Myklebust2012-10-041-1/+1
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4 set open access operation call flag in nfs4_init_opendata_resAndy Adamson2012-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | nfs4_open_recover_helper zeros the nfs4_opendata result structures, removing the result access_request information which leads to an XDR decode error. Move the setting of the result access_request field to nfs4_init_opendata_res which sets all the other required nfs4_opendata result fields and is shared between the open and recover open paths. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTALTrond Myklebust2012-10-031-2/+2
| | | | | | | | | | | | | | CONFIG_EXPERIMENTAL is deprecated and, regardless of that, this code is being enabled in most newer distributions. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4 reduce attribute requests for open reclaimAndy Adamson2012-10-022-27/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | We currently make no distinction in attribute requests between normal OPENs and OPEN with CLAIM_PREVIOUS. This offers more possibility of failures in the GETATTR response which foils OPEN reclaim attempts. Reduce the requested attributes to the bare minimum needed to update the reclaim open stateid and split nfs4_opendata_to_nfs4_state processing accordingly. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: nfs4_open_done first must check that GETATTR decoded a file typeTrond Myklebust2012-10-021-1/+3
| | | | | | | | | | | | ...before it can check the validity of that file type. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Deal with wraparound when updating the layout "barrier" seqidTrond Myklebust2012-10-021-4/+7
| | | | | | | | | | | | ...and fix a bug in pnfs_set_layout_stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Deal with wraparound issues when updating the layout stateidTrond Myklebust2012-10-021-1/+10
| | | | | | | | | | | | ...and add a helper function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Always set the layout stateid if this is the first layoutgetTrond Myklebust2012-10-021-3/+5
| | | | | | | | | | | | | | If the list of layout segments is empty, we must unconditionally set the layout stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layoutTrond Myklebust2012-10-021-7/+8
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: don't put ACCESS in OPEN compound if O_EXCLWeston Andros Adamson2012-10-022-7/+17
| | | | | | | | | | | | | | | | | | | | | | Don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS will return permission denied for all bits until close. Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to OPEN compound) Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: don't check MAY_WRITE access bit in OPENWeston Andros Adamson2012-10-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Don't check MAY_WRITE as a newly created file may not have write mode bits, but POSIX allows the creating process to write regardless. This is ok because NFSv4 OPEN ops handle write permissions correctly - the ACCESS in the OPEN compound is to differentiate READ v EXEC permissions. Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to OPEN compound) Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Set key construction data for the legacy upcallBryan Schumaker2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | This prevents a null pointer dereference when nfs_idmap_complete_pipe_upcall_locked() calls complete_request_key(). Fixes a regression caused by commit 0cac12023 (NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data). Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: don't do two EXCHANGE_IDs on mountWeston Andros Adamson2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Since the addition of NFSv4 server trunking detection the mount context calls nfs4_proc_exchange_id then schedules the state manager, which also calls nfs4_proc_exchange_id. Setting the NFS4CLNT_LEASE_CONFIRM bit makes the state manager skip the unneeded EXCHANGE_ID and continue on with session creation. Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: nfs41_walk_client_list(): re-lock before iteratingChuck Lever2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Sparse identified an execution path in nfs41_walk_client_list() where the nfs_client_lock is not re-acquired before taking the next loop iteration. fs/nfs/nfs4client.c:437:9: sparse: context imbalance in 'nfs41_walk_client_list' - different lock contexts for basic block Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: Handle BAD_STATEID and EXPIRED errors in layoutgetTrond Myklebust2012-10-021-8/+26
| | | | | | | | | | | | | | If the layoutget call returns a stateid error, we want to invalidate the layout stateid, and/or recover the open stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: bl_pg_init_write should be staticTrond Myklebust2012-10-021-1/+1
| | | | | | | | | | Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: include internal.h in getroot.hStanislav Kinsbursky2012-10-021-0/+2
| | | | | | | | | | | | | | | | | | Sparse warning: fs/nfs/nfs4getroot.c:11:5: warning: symbol 'nfs4_get_rootfh' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: include nfs4_fh.h in nfs4sysctl.cStanislav Kinsbursky2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | Sparse warnings: fs/nfs/nfs4sysctl.c:56:5: warning: symbol 'nfs4_register_sysctl' was not declared. Should it be static? fs/nfs/nfs4sysctl.c:64:6: warning: symbol 'nfs4_unregister_sysctl' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: declare nfs_xdev_mount as staticStanislav Kinsbursky2012-10-021-1/+1
| | | | | | | | | | | | | | | | | | Sparse warning: fs/nfs/super.c:2517:15: warning: symbol 'nfs_xdev_mount' was not declared. Should it be static? Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: declare nfs_callback_tcp_port in headerStanislav Kinsbursky2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | Sparse warning: fs/nfs/super.c:2638:16: sparse: symbol 'nfs_callback_tcpport' was not declared. Should it be static? Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: include NFSv4 header in netns.hStanislav Kinsbursky2012-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | Build error: fs/nfs/netns.h:27:15: error: 'NFS4_MAX_MINOR_VERSION' undeclared here (not in a function) Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.0 reclaim reboot state when re-establishing clientidAndy Adamson2012-10-011-2/+6
| | | | | | | | | | | | | | We should reclaim reboot state when the clientid is stale. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: nfs4_match_clientids is only used by NFSv4.1Trond Myklebust2012-10-011-15/+15
| | | | | | | | | | | | Fix another compiler warning. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: Fix the minor version callback channel startupTrond Myklebust2012-10-011-14/+13
| | | | | | | | | | | | | | The current spaghetti code confuses some versions of gcc (and just looks ugly as hell)! Clean up... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: Fix up a merge conflict between migration and container changesTrond Myklebust2012-10-011-2/+3
| | | | | | | | | | | | nfs_callback_tcpport is now per-net_namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * Merge branch 'bugfixes' into nfs-for-nextTrond Myklebust2012-10-011-34/+70
| |\
| | * NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall dataTrond Myklebust2012-09-281-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | Use the idmapper upcall data to verify that the legacy idmapper daemon is indeed responding to an upcall that we sent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Bryan Schumaker <bjschuma@netapp.com>
| | * NFSv4: Clean up the legacy idmapper upcallTrond Myklebust2012-09-281-21/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the BUG_ON(idmap->idmap_key_cons != NULL) with a WARN_ON_ONCE(). Then get rid of the ACCESS_ONCE(idmap->idmap_key_cons). Then add helper functions for starting, finishing and aborting the legacy upcall. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Bryan Schumaker <bjschuma@netapp.com>
| | * NFSv4: Remove BUG_ON() and ACCESS_ONCE() calls in the idmapperTrond Myklebust2012-09-281-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of ACCESS_ONCE() is wrong, since the various routines that set/clear idmap->idmap_key_cons should be strictly ordered w.r.t. each other, and the idmap->idmap_mutex ensures that only one thread at a time may be in an upcall situation. Also replace the BUG_ON()s with WARN_ON_ONCE() where appropriate. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | nfs: replace strict_strto* with kstrto*Daniel Walter2012-10-012-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | [nfs] replace strict_str* with kstr* variants * replace string conversions with newer kstr* functions Signed-off-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Remove unnecessary semicolons (fs/nfs/client.c)Yanchuan Nian2012-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | There are some unnecessary semicolons in function find_nfs_version. Just remove them. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | pnfsblock: use list_move_tail instead of list_del/list_add_tailWei Yongjun2012-10-011-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | pnfsblock: fix non-aligned DIO writePeng Tao2012-10-011-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For DIO writes, if it is not blocksize aligned, we need to do internal serialization. It may slow down writers anyway. So we just bail them out and resend to MDS. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | pnfsblock: fix non-aligned DIO readPeng Tao2012-10-011-8/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For DIO read, if it is not sector aligned, we should reject it and resend via MDS. Otherwise there might be data corruption. Also teach bl_read_pagelist to handle partial page reads for DIO. Cc: stable <stable@vger.kernel.org> [since v3.4] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | pnfsblock: fix partial page buffer wirtePeng Tao2012-10-012-12/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to handle non-sector aligned writes. Otherwise there will be data corruption. Cc: stable <stable@vger.kernel.org> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | Revert "pnfsblock: bail out partial page IO"Peng Tao2012-10-011-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 159e0561e322dd8008fff59e36efff8d2bdd0b0e, in favor of a more complete fix to the alignment issue. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS41: fix error of setting blocklayoutdriverPeng Tao2012-10-012-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit e38eb650 (NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()), set_pnfs_layoutdriver() is called inside nfs4_proc_fsinfo(), but pnfs_blksize is not set. It causes setting blocklayoutdriver failure and pnfsblock mount failure. Cc: stable <stable@vger.kernel.org> [since v3.5] Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFSv41: fix DIO write_io calculationPeng Tao2012-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pnfs_within_mdsthreshold() is called inside pg_init. We need to set read_io/write_io before that. Otherwise we fail pnfs_within_mdsthreshold() and IO goes to MDS. A simple test case: dd if=foo of=/mnt/pnfs/bar bs=10M count=1 oflag=direct Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Add nfs4_unique_id boot parameterChuck Lever2012-10-013-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An optional boot parameter is introduced to allow client administrators to specify a string that the Linux NFS client can insert into its nfs_client_id4 id string, to make it both more globally unique, and to ensure that it doesn't change even if the client's nodename changes. If this boot parameter is not specified, the client's nodename is used, as before. Client installation procedures can create a unique string (typically, a UUID) which remains unchanged during the lifetime of that client instance. This works just like creating a UUID for the label of the system's root and boot volumes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Discover NFSv4 server trunking when mountingChuck Lever2012-10-016-2/+454
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Server trunking" is a fancy named for a multi-homed NFS server. Trunking might occur if a client sends NFS requests for a single workload to multiple network interfaces on the same server. There are some implications for NFSv4 state management that make it useful for a client to know if a single NFSv4 server instance is multi-homed. (Note this is only a consideration for NFSv4, not for legacy versions of NFS, which are stateless). If a client cares about server trunking, no NFSv4 operations can proceed until that client determines who it is talking to. Thus server IP trunking discovery must be done when the client first encounters an unfamiliar server IP address. The nfs_get_client() function walks the nfs_client_list and matches on server IP address. The outcome of that walk tells us immediately if we have an unfamiliar server IP address. It invokes nfs_init_client() in this case. Thus, nfs4_init_client() is a good spot to perform trunking discovery. Discovery requires a client to establish a fresh client ID, so our client will now send SETCLIENTID or EXCHANGE_ID as the first NFS operation after a successful ping, rather than waiting for an application to perform an operation that requires NFSv4 state. The exact process for detecting trunking is different for NFSv4.0 and NFSv4.1, so a minorversion-specific init_client callout method is introduced. CLID_INUSE recovery is important for the trunking discovery process. CLID_INUSE is a sign the server recognizes the client's nfs_client_id4 id string, but the client is using the wrong principal this time for the SETCLIENTID operation. The SETCLIENTID must be retried with a series of different principals until one works, and then the rest of trunking discovery can proceed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Use the same nfs_client_id4 for every serverChuck Lever2012-10-011-12/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when identifying itself to NFS servers, the Linux NFS client uses a unique nfs_client_id4.id string for each server IP address it talks with. For example, when client A talks to server X, the client identifies itself using a string like "AX". The requirements for these strings are specified in detail by RFC 3530 (and bis). This form of client identification presents a problem for Transparent State Migration. When client A's state on server X is migrated to server Y, it continues to be associated with string "AX." But, according to the rules of client string construction above, client A will present string "AY" when communicating with server Y. Server Y thus has no way to know that client A should be associated with the state migrated from server X. "AX" is all but abandoned, interfering with establishing fresh state for client A on server Y. To support transparent state migration, then, NFSv4.0 clients must instead use the same nfs_client_id4.id string to identify themselves to every NFS server; something like "A". Now a client identifies itself as "A" to server X. When a file system on server X transitions to server Y, and client A identifies itself as "A" to server Y, Y will know immediately that the state associated with "A," whether it is native or migrated, is owned by the client, and can merge both into a single lease. As a pre-requisite to adding support for NFSv4 migration to the Linux NFS client, this patch changes the way Linux identifies itself to NFS servers via the SETCLIENTID (NFSv4 minor version 0) and EXCHANGE_ID (NFSv4 minor version 1) operations. In addition to removing the server's IP address from nfs_client_id4, the Linux NFS client will also no longer use its own source IP address as part of the nfs_client_id4 string. On multi-homed clients, the value of this address depends on the address family and network routing used to contact the server, thus it can be different for each server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Introduce "migration" mount optionChuck Lever2012-10-012-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the Linux client uses a unique nfs_client_id4.id string when identifying itself to distinct NFS servers. To support transparent state migration, the Linux client will have to use the same nfs_client_id4 string for all servers it communicates with (also known as the "uniform client string" approach). Otherwise NFS servers can not recognize that open and lock state need to be merged after a file system transition. Unfortunately, there are some NFSv4.0 servers currently in the field that do not tolerate the uniform client string approach. Thus, by default, our NFSv4.0 mounts will continue to use the current approach, and we introduce a mount option that switches them to use the uniform model. Client administrators must identify which servers can be mounted with this option. Eventually most NFSv4.0 servers will be able to handle the uniform approach, and we can change the default. The first mount of a server controls the behavior for all subsequent mounts for the lifetime of that set of mounts of that server. After the last mount of that server is gone, the client erases the data structure that tracks the lease. A subsequent lease may then honor a different "migration" setting. This patch adds only the infrastructure for parsing the new mount option. Support for uniform client strings is added in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | SUNRPC: Introduce rpc_clone_client_set_auth()Chuck Lever2012-10-012-24/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An ULP is supposed to be able to replace a GSS rpc_auth object with another GSS rpc_auth object using rpcauth_create(). However, rpcauth_create() in 3.5 reliably fails with -EEXIST in this case. This is because when gss_create() attempts to create the upcall pipes, sometimes they are already there. For example if a pipe FS mount event occurs, or a previous GSS flavor was in use for this rpc_clnt. It turns out that's not the only problem here. While working on a fix for the above problem, we noticed that replacing an rpc_clnt's rpc_auth is not safe, since dereferencing the cl_auth field is not protected in any way. So we're deprecating the ability of rpcauth_create() to switch an rpc_clnt's security flavor during normal operation. Instead, let's add a fresh API that clones an rpc_clnt and gives the clone a new flavor before it's used. This makes immediate use of the new __rpc_clone_client() helper. This can be used in a similar fashion to rpcauth_create() when a client is hunting for the correct security flavor. Instead of replacing an rpc_clnt's security flavor in a loop, the ULP replaces the whole rpc_clnt. To fix the -EEXIST problem, any ULP logic that relies on replacing an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use this API instead. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
OpenPOWER on IntegriCloud