summaryrefslogtreecommitdiffstats
path: root/fs/nfs/pnfs.c
Commit message (Collapse)AuthorAgeFilesLines
* skip LAYOUTRETURN if layout is invalidOlga Kornievskaia2018-06-121-2/+4
| | | | | | | | | | | | | | | Currently, when IO to DS fails, client returns the layout and retries against the MDS. However, then on umounting (inode eviction) it returns the layout again. This is because pnfs_return_layout() was changed in commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR") to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned the layout, it will be returned again. Instead, let's also check if we have already marked the layout invalid. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Don't call commit on failed layoutget-on-openTrond Myklebust2018-05-311-6/+1
| | | | | | | If the layoutget on open call failed, we can't really commit the inode, so don't bother calling it. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached dataTrond Myklebust2018-05-311-0/+5
| | | | | | | | If we're only opening the file for reading, and the file is empty and/or we already have cached data, then heuristically optimise away the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/pnfs: Don't switch off layoutget-on-open for transient errorsTrond Myklebust2018-05-311-7/+15
| | | | | | | | | | Ensure that we only switch off the LAYOUTGET operation in the OPEN compound when the server is truly broken, and/or it is complaining that the compound is too large. Currently, we end up turning off the functionality permanently, even for transient errors such as EACCES or ENOSPC. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised dataTrond Myklebust2018-05-311-1/+2
| | | | | | | | | | | | | We need to ensure that pnfs_parse_lgopen() doesn't try to parse a struct nfs4_layoutget_res that was not filled by a successful call to decode_layoutget(). This can happen if we performed a cached open, or if either the OP_ACCESS or OP_GETATTR operations preceding the OP_LAYOUTGET in the compound returned an error. By initialising the 'status' field to NFS4ERR_DELAY, we ensure that pnfs_parse_lgopen() won't try to interpret the structure. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGETFred Isaman2018-05-311-6/+14
| | | | | | | The flag was not always being cleared after LAYOUTGET on OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Add barrier to prevent lgopen using LAYOUTGET during recallFred Isaman2018-05-311-1/+7
| | | | | | | | | | Since the LAYOUTGET on OPEN can be sent without prior inode information, existing methods to prevent LAYOUTGET from being sent while processing CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET was being sent, and if so ignore the results. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Stop attempting LAYOUTGET on OPEN on failureFred Isaman2018-05-311-1/+18
| | | | | Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Add LAYOUTGET to OPEN of an existing fileFred Isaman2018-05-311-17/+73
| | | | | Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Refactor nfs4_layoutget_release()Trond Myklebust2018-05-311-0/+50
| | | | | | | Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c where it can be reused by the layoutget on open code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Add LAYOUTGET to OPEN of a new fileFred Isaman2018-05-311-2/+79
| | | | | | | | This triggers when have no pre-existing inode to attach to. The preexisting case is saved for later. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Change pnfs_alloc_init_layoutget_args call signatureFred Isaman2018-05-311-12/+28
| | | | | | | | | | Don't send in a layout, instead use the (possibly NULL) inode. This is needed for LAYOUTGET attached to an OPEN where the inode is not yet set. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: Move nfs4_opendata into nfs4_fs.hFred Isaman2018-05-311-0/+1
| | | | | | | It will be needed now by the pnfs code. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: move allocations out of nfs4_proc_layoutgetFred Isaman2018-05-311-1/+11
| | | | | | | They work better in the new alloc_init function. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pnfs: refactor send_layoutgetFred Isaman2018-05-311-18/+15
| | | | | | | Pull out the alloc/init part for eventual reuse by OPEN. Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* pNFS: Prevent the layout header refcount going to zero in pnfs_roc()Trond Myklebust2018-03-081-3/+10
| | | | | | | | | | | Ensure that we hold a reference to the layout header when processing the pNFS return-on-close so that the refcount value does not inadvertently go to zero. Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.10+ Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
* nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mdsScott Mayhew2018-01-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pnfs/blocklayout: handle transient devicesBenjamin Coddington2018-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | PNFS block/SCSI layouts should gracefully handle cases where block devices are not available when a layout is retrieved, or the block devices are removed while the client holds a layout. While setting up a layout segment, keep a record of an unavailable or un-parsable block device in cache with a flag so that subsequent layouts do not spam the server with GETDEVINFO. We can reuse the current NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing the device, we will discard it and send a fresh GETDEVINFO after the timeout, since the lookup and validation of the device occurs within the GETDEVINFO response handling. A lookup of a layout segment that references an unavailable device will return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow the pgio layer to mark the layout with the appropriate fail bit, which forces subsequent IO to the MDS, and prevents spamming the server with LAYOUTGET, LAYOUTRETURN. Finally, when IO to a block device fails, look up the block device(s) referenced by the pgio header, and mark them as unavailable. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-closeTrond Myklebust2017-11-171-0/+18
| | | | | | | | | If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID, then try to update the stateid and retry. We know that there should be no further LAYOUTGET requests being launched. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* NFS: Fix bool initialization/comparisonThomas Meyer2017-11-171-1/+1
| | | | | | | | Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_tElena Reshetova2017-11-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_tElena Reshetova2017-11-171-6/+6
| | | | | | | | | | | | | | refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* pNFS: Use the standard I/O stateid when calling LAYOUTGETTrond Myklebust2017-09-111-5/+9
| | | | | | | | | | Instead of having a private method for copying the open/delegation stateid, use the same call that is used for standard I/O through the MDS. Note that this means we transmit the stateid with a zero seqid, avoiding issues with NFS4ERR_OLD_STATEID. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix 2 use after free issues in the I/O codeTrond Myklebust2017-09-081-2/+0
| | | | | | | | | | | | | | | The writeback code wants to send a commit after processing the pages, which is why we want to delay releasing the struct path until after that's done. Also, the layout code expects that we do not free the inode before we've put the layout segments in pnfs_writehdr_free() and pnfs_readhdr_free() Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete") Fixes: 4714fb51fd03 ("nfs: remove pgio_header refcount, related cleanup") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg()Trond Myklebust2017-08-151-41/+0
| | | | | | | Now that we no longer hold the inode->i_lock when manipulating the commit lists, it is safe to call pnfs_put_lseg() again. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pnfs: Fix the check for requests in range of layout segmentBenjamin Coddington2017-05-241-8/+17
| | | | | | | | | | | | | | | | | | | It's possible and acceptable for NFS to attempt to add requests beyond the range of the current pgio->pg_lseg, a case which should be caught and limited by the pg_test operation. However, the current handling of this case replaces pgio->pg_lseg with a new layout segment (after a WARN) within that pg_test operation. That will cause all the previously added requests to be submitted with this new layout segment, which may not be valid for those requests. Fix this problem by only returning zero for the number of bytes to coalesce from pg_test for this case which allows any previously added requests to complete on the current layout segment. The check for requests starting out of range of the layout segment moves to pg_init, so that the replacement of pgio->pg_lseg will be done when the next request is added. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust2017-05-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust2017-05-021-1/+7
| | | | | | | | In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust2017-04-291-0/+3
| | | | | | | | If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2017-04-291-1/+13
| | | | | | | | | | If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust2017-04-281-0/+4
| | | | | | | pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust2017-04-251-3/+13
| | | | | | | | | The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust2017-04-251-0/+13
| | | | | | | | If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layoutTrond Myklebust2017-04-201-2/+0
| | | | | | They are not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix a reference leak in _pnfs_return_layoutTrond Myklebust2017-01-261-1/+1
| | | | | | | | | | IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit without freeing the list of invalidated layout segments, leading to a reference leak. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Return RW layouts on OPEN_DOWNGRADETrond Myklebust2016-12-191-3/+13
| | | | | | | | | | If the client holds no more writeable open state, and does not hold a write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE. We do this only for writes, since some layout drivers may require you to also hold a read layout if you are doing a R/W workload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateidTrond Myklebust2016-12-051-9/+12
| | | | | | | | Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the layout stateid, so that processes and RPC tasks that are waiting on the layout return can continue. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Add a layoutreturn callback to performa layout-private setupTrond Myklebust2016-12-031-1/+13
| | | | | | | Add a callback to allow the flexfiles layout driver to initialise the layout private payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturnTrond Myklebust2016-12-021-0/+1
| | | | | | | Cleanup to allow layout drivers to attach private data to layoutreturn, and manage the data. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Skip invalid stateids when doing a bulk destroyTrond Myklebust2016-12-011-0/+2
| | | | | | If the layout stateid is already invalid, we have no work to do. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()Trond Myklebust2016-12-011-0/+9
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't mark the layout as freed if the last lseg is marked for returnTrond Myklebust2016-12-011-0/+2
| | | | | | Address another memory leak. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturnTrond Myklebust2016-12-011-14/+15
| | | | | | | | Ensure that the layout state bits are synced when we cache a layout segment for layoutreturn using an appropriate call to pnfs_set_plh_return_info. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Fix bugs in _pnfs_return_layoutTrond Myklebust2016-12-011-3/+10
| | | | | | | | | | | We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of whether or not there are layout segments pending. Furthermore, we should ensure that we leave the plh_return_segs list empty. This patch fixes a memory leak of the layout segments on plh_return_segs. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalidTrond Myklebust2016-12-011-1/+18
| | | | | | | When the layout state is invalidated, then so is the layout segment state, and hence we do need to clean up the state bits. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Enable layoutreturn operation for return-on-closeTrond Myklebust2016-12-011-77/+62
| | | | | | | | | | Amend the pnfs return on close helper functions to enable sending the layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the client and the server to agree on whether or not there is an outstanding layout. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Clean up - add a helper to initialise struct layoutreturn_argsTrond Myklebust2016-12-011-7/+18
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_rocTrond Myklebust2016-12-011-7/+13
| | | | | | | The layoutreturn call will take care of invalidating the layout segments once the call is successful. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Skip checking for return-on-close if the layout is invalidTrond Myklebust2016-12-011-1/+2
| | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pNFS: Remove spurious wake up in pnfs_layout_remove_lseg()Trond Myklebust2016-12-011-3/+0
| | | | | | | There is no change to the value of NFS_LAYOUT_RETURN, so we should not be waking up the RPC call. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
OpenPOWER on IntegriCloud