summaryrefslogtreecommitdiffstats
path: root/fs/nfs/pnfs.c
Commit message (Collapse)AuthorAgeFilesLines
* pnfs: Fix the check for requests in range of layout segmentBenjamin Coddington2017-05-241-8/+17
| | | | | | | | | | | | | | | | | | | It's possible and acceptable for NFS to attempt to add requests beyond the range of the current pgio->pg_lseg, a case which should be caught and limited by the pg_test operation. However, the current handling of this case replaces pgio->pg_lseg with a new layout segment (after a WARN) within that pg_test operation. That will cause all the previously added requests to be submitted with this new layout segment, which may not be valid for those requests. Fix this problem by only returning zero for the number of bytes to coalesce from pg_test for this case which allows any previously added requests to complete on the current layout segment. The check for requests starting out of range of the layout segment moves to pg_init, so that the replacement of pgio->pg_lseg will be done when the next request is added. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust2017-05-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust2017-05-021-1/+7
| | | | | | | | In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust2017-04-291-0/+3
| | | | | | | | If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2017-04-291-1/+13
| | | | | | | | | | If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust2017-04-281-0/+4
| | | | | | | pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust2017-04-251-3/+13
| | | | | | | | | The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust2017-04-251-0/+13
| | | | | | | | If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layoutTrond Myklebust2017-04-201-2/+0
| | | | | | They are not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix a reference leak in _pnfs_return_layoutTrond Myklebust2017-01-261-1/+1
| | | | | | | | | | IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit without freeing the list of invalidated layout segments, leading to a reference leak. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Return RW layouts on OPEN_DOWNGRADETrond Myklebust2016-12-191-3/+13
| | | | | | | | | | If the client holds no more writeable open state, and does not hold a write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE. We do this only for writes, since some layout drivers may require you to also hold a read layout if you are doing a R/W workload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateidTrond Myklebust2016-12-051-9/+12
| | | | | | | | Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the layout stateid, so that processes and RPC tasks that are waiting on the layout return can continue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Add a layoutreturn callback to performa layout-private setupTrond Myklebust2016-12-031-1/+13
| | | | | | | Add a callback to allow the flexfiles layout driver to initialise the layout private payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturnTrond Myklebust2016-12-021-0/+1
| | | | | | | Cleanup to allow layout drivers to attach private data to layoutreturn, and manage the data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Skip invalid stateids when doing a bulk destroyTrond Myklebust2016-12-011-0/+2
| | | | | | If the layout stateid is already invalid, we have no work to do. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()Trond Myklebust2016-12-011-0/+9
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't mark the layout as freed if the last lseg is marked for returnTrond Myklebust2016-12-011-0/+2
| | | | | | Address another memory leak. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturnTrond Myklebust2016-12-011-14/+15
| | | | | | | | Ensure that the layout state bits are synced when we cache a layout segment for layoutreturn using an appropriate call to pnfs_set_plh_return_info. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix bugs in _pnfs_return_layoutTrond Myklebust2016-12-011-3/+10
| | | | | | | | | | | We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of whether or not there are layout segments pending. Furthermore, we should ensure that we leave the plh_return_segs list empty. This patch fixes a memory leak of the layout segments on plh_return_segs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalidTrond Myklebust2016-12-011-1/+18
| | | | | | | When the layout state is invalidated, then so is the layout segment state, and hence we do need to clean up the state bits. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Enable layoutreturn operation for return-on-closeTrond Myklebust2016-12-011-77/+62
| | | | | | | | | | Amend the pnfs return on close helper functions to enable sending the layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the client and the server to agree on whether or not there is an outstanding layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Clean up - add a helper to initialise struct layoutreturn_argsTrond Myklebust2016-12-011-7/+18
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_rocTrond Myklebust2016-12-011-7/+13
| | | | | | | The layoutreturn call will take care of invalidating the layout segments once the call is successful. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Skip checking for return-on-close if the layout is invalidTrond Myklebust2016-12-011-1/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Remove spurious wake up in pnfs_layout_remove_lseg()Trond Myklebust2016-12-011-3/+0
| | | | | | | There is no change to the value of NFS_LAYOUT_RETURN, so we should not be waking up the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalidTrond Myklebust2016-12-011-1/+7
| | | | | | | Fix a potential race with CB_LAYOUTRECALL in which the server recalls the remaining layout segments while our LAYOUTRETURN is still in transit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Do not free layout segments that are marked for returnTrond Myklebust2016-12-011-9/+65
| | | | | | | | We may want to process and transmit layout stat information for the layout segments that are being returned, so we should defer freeing them until after the layoutreturn has completed. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: consolidate the different range intersection testsTrond Myklebust2016-12-011-32/+3
| | | | | | | Both pnfs.c and the flexfiles code have their own versions of the range intersection testing, and the "end_offset" helper. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix race in pnfs_wait_on_layoutreturnTrond Myklebust2016-12-011-5/+3
| | | | | | | | We must put the task to sleep while holding the inode->i_lock in order to ensure atomicity with the test for NFS_LAYOUT_RETURN. Fixes: 500d701f336b ("NFS41: make close wait for layoutreturn") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completedTrond Myklebust2016-12-011-1/+5
| | | | | | | | If there is an I/O error, we should not call LAYOUTGET until the LAYOUTRETURN that reports the error is complete. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
* pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cacheTrond Myklebust2016-12-011-5/+6
| | | | | | | | | If the server sends us a completely new stateid, and the client thinks it already holds a layout, then force a retry of the LAYOUTGET after invalidating the existing layout in order to avoid corruption due to races. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateidTrond Myklebust2016-12-011-8/+9
| | | | | | | | | We must ensure that we don't schedule a layoutreturn if the layout stateid has been marked as invalid. Fixes: 2a59a0411671e ("pNFS: Fix pnfs_set_layout_stateid() to clear...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
* pNFS: Don't clear the layout stateid if a layout return is outstandingTrond Myklebust2016-12-011-1/+3
| | | | | | | | | | If we no longer hold any layout segments, we're normally expected to consider the layout stateid to be invalid. However we cannot assume this if we're about to, or in the process of sending a layoutreturn. Fixes: 334a8f37115b ("pNFS: Don't forget the layout stateid if...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
* pNFS: Fix a deadlock between read resends and layoutreturnTrond Myklebust2016-12-011-0/+4
| | | | | | | | | | We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor while holding a reference to a layout segment, as that can deadlock pnfs_update_layout(). Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.0+
* NFS: Don't print a pNFS error if we aren't using pNFSAnna Schumaker2016-11-071-0/+2
| | | | | | | | | | | | | | | We used to check for a valid layout type id before verifying pNFS flags as an indicator for if we are using pNFS. This changed in 3132e49ece with the introduction of multiple layout types, since now we are passing an array of ids instead of just one. Since then, users have been seeing a KERN_ERR printk show up whenever mounting NFS v4 without pNFS. This patch restores the original behavior of exiting set_pnfs_layoutdriver() early if we aren't using pNFS. Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo structure") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Fix atime updates on pNFS clientsTrond Myklebust2016-09-271-3/+1
| | | | | | | | | Fix the code so that we always mark the atime as invalid in nfs4_read_done(). Currently, the expectation appears to be that the pNFS drivers should always do this, with the result that most of them don't. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pnfs: add a new mechanism to select a layout driver according to an ordered listJeff Layton2016-09-191-8/+48
| | | | | | | | | | | | | | | | | | Currently, the layout driver selection code always chooses the first one from the list. That's not really ideal however, as the server can send the list of layout types in any order that it likes. It's up to the client to select the best one for its needs. This patch adds an ordered list of preferred driver types and has the selection code sort the list of available layout drivers according to it. Any unrecognized layout type is sorted to the end of the list. For now, the order of preference is hardcoded, but it should be possible to make this configurable in the future. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pnfs: track multiple layout types in fsinfo structureJeff Layton2016-09-191-11/+16
| | | | | | | | | | | | | | | Current NFSv4.1/pNFS client assumes that MDS supports only one layout type. While it's true for most existing servers, nevertheless, this can be change in the near future. For now, this patch just plumbs in the ability to track a list of layouts in the fsinfo structure. The existing behavior of the client is preserved, by having it just select the first entry in the list. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Don't forget the layout stateid if there are outstanding LAYOUTGETsTrond Myklebust2016-09-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | If there are outstanding LAYOUTGET rpc calls, then we want to ensure that we keep the layout stateid around so we that don't inadvertently pick up an old/misordered sequence id. The race is as follows: Client Server ====== ====== LAYOUTGET(seqid) LAYOUTGET(seqid) return LAYOUTGET(seqid+1) return LAYOUTGET(seqid+2) process LAYOUTGET(seqid+2) forget layout process LAYOUTGET(seqid+1) If it forgets the layout stateid before processing seqid+1, then the client will not check the layout->plh_barrier, and so will set the stateid with seqid+1. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix pnfs_set_layout_stateid() to clear NFS_LAYOUT_INVALID_STIDTrond Myklebust2016-09-031-17/+19
| | | | | | | If the layout was marked as invalid, we want to ensure to initialise the layout header fields correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialisedTrond Myklebust2016-09-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to RFC5661, the client is responsible for serialising LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case where we send both in parallel. Client Server ====== ====== LAYOUTGET(seqid=X) LAYOUTRETURN(seqid=X) LAYOUTGET return seqid=X+1 LAYOUTRETURN return seqid=X+2 Process LAYOUTRETURN Forget layout stateid Process LAYOUTGET Set seqid=X+1 The client processes the layoutget/layoutreturn in the wrong order, and since the result of the layoutreturn was to clear the only existing layout segment, the client forgets the layout stateid. When the LAYOUTGET comes in, it is treated as having a completely new stateid, and so the client sets the wrong sequence id... Fix is to check if there are outstanding LAYOUTGET requests before we send the LAYOUTRETURN (note that LAYOUGET will already wait if it sees an outstanding LAYOUTRETURN). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: The client must not do I/O to the DS if it's lease has expiredTrond Myklebust2016-08-231-0/+1
| | | | | | | | | | | | | Ensure that the client conforms to the normative behaviour described in RFC5661 Section 12.7.2: "If a client believes its lease has expired, it MUST NOT send I/O to the storage device until it has validated its lease." So ensure that we wait for the lease to be validated before using the layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v3.20+
* pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT callsTrond Myklebust2016-08-191-1/+0
| | | | | | We normally want to update the stateid and then retry, Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'pnfs'Trond Myklebust2016-07-241-62/+89
|\
| * pNFS: Remove redundant smp_mb() from pnfs_init_lseg()Trond Myklebust2016-07-241-1/+0
| | | | | | | | | | | | It's not visible yet, and won't be until after we grab the inode->i_lock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * pNFS: Cleanup - do layout segment initialisation in one placeTrond Myklebust2016-07-241-4/+6
| | | | | | | | | | | | | | ...instead of splitting the initialisation over init_lseg() and pnfs_layout_process(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * pNFS: Remove redundant stateid invalidationTrond Myklebust2016-07-241-1/+0
| | | | | | | | | | | | | | The layout stateid will be invalidated once it holds no more layout segments anyway. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()Trond Myklebust2016-07-241-1/+0
| | | | | | | | | | | | That's already being taken care of in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * pNFS: Clear the layout metadata if the server changed the layout stateidTrond Myklebust2016-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | If the server changed the layout stateid's "other" field, then we should treat the old layout as being completely gone. In that case, we want to clear the metadata such as scheduled layoutreturns. Do this by calling pnfs_mark_layout_stateid_invalid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()Trond Myklebust2016-07-241-1/+1
| | | | | | | | | | | | | | Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud