| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable bugfixes:
- nfs: don't create zero-length requests
- several LAYOUTGET bugfixes
Features:
- several performance related features
- more aggressive caching when we can rely on close-to-open
cache consistency
- remove serialisation of O_DIRECT reads and writes
- optimise several code paths to not flush to disk unnecessarily.
However allow for the idiosyncracies of pNFS for those layout
types that need to issue a LAYOUTCOMMIT before the metadata can
be updated on the server.
- SUNRPC updates to the client data receive path
- pNFS/SCSI support RH/Fedora dm-mpath device nodes
- pNFS files/flexfiles can now use unprivileged ports when
the generic NFS mount options allow it.
Bugfixes:
- Don't use RDMA direct data placement together with data
integrity or privacy security flavours
- Remove the RDMA ALLPHYSICAL memory registration mode as
it has potential security holes.
- Several layout recall fixes to improve NFSv4.1 protocol
compliance.
- Fix an Oops in the pNFS files and flexfiles connection
setup to the DS
- Allow retry of operations that used a returned delegation
stateid
- Don't mark the inode as revalidated if a LAYOUTCOMMIT is
outstanding
- Fix writeback races in nfs4_copy_range() and
nfs42_proc_deallocate()"
* tag 'nfs-for-4.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (104 commits)
pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding
NFSv4: Clean up lookup of SECINFO_NO_NAME
NFSv4.2: Fix warning "variable ‘stateids’ set but not used"
NFSv4: Fix warning "no previous prototype for ‘nfs4_listxattr’"
SUNRPC: Fix a compiler warning in fs/nfs/clnt.c
pNFS: Remove redundant smp_mb() from pnfs_init_lseg()
pNFS: Cleanup - do layout segment initialisation in one place
pNFS: Remove redundant stateid invalidation
pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
pNFS: Clear the layout metadata if the server changed the layout stateid
pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
pNFS: Do not set plh_return_seq for non-callback related layoutreturns
pNFS: Ensure layoutreturn acts as a completion for layout callbacks
pNFS: Fix CB_LAYOUTRECALL stateid verification
pNFS: Always update the layout barrier seqid on LAYOUTGET
pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
pNFS: Clear the layout return tracking on layout reinitialisation
pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
nfs: don't create zero-length requests
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode(). The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Use the minor version ops cached in struct nfs_client instead of looking
them up again.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Replace it with a test for whether or not the sent a stateid in violation
of what we asked for.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| | |
Make it static
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Before commit 778be232a207 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It's not visible yet, and won't be until after we grab the inode->i_lock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
...instead of splitting the initialisation over init_lseg() and
pnfs_layout_process().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The layout stateid will be invalidated once it holds no more layout
segments anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
That's already being taken care of in pnfs_layout_remove_lseg().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.
Do this by calling pnfs_mark_layout_stateid_invalid().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid
invalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We want to evaluate in this order:
If the client holds no layout for this inode, then return
NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout.
If the client finds the inode among the list of layouts, but the corresponding
stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the
server to retry once the outstanding LAYOUTGET is complete.
If the current layout stateid's "other" field does not match the recalled
stateid, return NFS4ERR_BAD_STATEID.
If already processing a layout recall with a newer stateid, return
NFS4ERR_OLD_STATEID. This can only happens for servers that are
non-compliant with the NFSv4.1 protocol.
If already processing a layout recall with an older stateid, return
NFS4ERR_DELAY to ask the server to retry once the outstanding
LAYOUTRETURN is complete. Again, this is technically incompliant with
the NFSv4.1 protocol.
If the current layout sequence id is newer than the recalled stateid's
sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol
non-compliance.
If the current layout sequence id is older than the recalled stateid's
sequence id+1, return NFS4ERR_DELAY.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.
The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the layout was completely returned, then ignore the returned layout
stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |\ \
| | | | |
| | | | |
| | | | | |
Needed in order to work on top of pNFS changes in Linus' upstream kernel.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on
blkdev_get_by_*() step we get an pnfs_block_dev struct that is
uninitialized except for bdev field which is set to whatever error
blkdev_get_by_*() returns. bl_free_device() then tries to call
blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference.
Fixing this by setting bdev in struct pnfs_block_dev only if we didn't
get an error from blkdev_get_by_*().
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Avoid nfs return uuids/devices larger than maximum.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Avoid a bad nfs server return an unaligned length of signature.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme. Try these names first to ensure we always get a
multipath-capable device if it exists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch. Refactor it into a separate helper
to make room for the new variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
All write callbacks are required to call nfs_writeback_update_inode() upon
success to ensure that file size changes are recorded, and the attribute
cache is invalidated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We know that the attributes will need updating if there is still a
LAYOUTCOMMIT outstanding.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Prevent filesystem freezes while handling the write page fault.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We want to ensure that we write the cached data to the server, but
don't require it be synced to disk. If the server reboots, we will
get a stateid error, which will cause us to retry anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We need to ensure that any writes to the destination file are serialised
with the copy, meaning that the writeback has to occur under the inode lock.
Also relax the writeback requirement on the source, and rely on the
stateid checking to tell us if the source rebooted. Add the helper
nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as
is appropriate for pNFS servers that may need a layoutcommit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When punching holes in a file, we want to ensure the operation is
serialised w.r.t. other writes, meaning that we want to call
nfs_sync_inode() while holding the inode lock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When retrieving stat() information, NFS unfortunately does require us to
sync writes to disk in order to ensure that mtime and ctime are up to
date. However we shouldn't have to ensure that those writes are persisted.
Relaxing that requirement does mean that we may see an mtime/ctime change
if the server reboots and forces us to replay all writes.
The exception to this rule are pNFS clients that are required to send
layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
in _nfs_revalidate_inode().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A file that is open for O_DIRECT is by definition not obeying
close-to-open cache consistency semantics, so let's not cache
the attributes too aggressively either.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Clean up...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
There is only one caller that sets the "write" argument to true,
so just move the call to nfs_zap_mapping() and get rid of the
now redundant argument.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Preparation for the patch that de-serialises O_DIRECT reads and
writes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
On success, the RPC callbacks will ensure that we make the appropriate calls
to nfs_writeback_update_inode()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We should not be interested in looking at the value of the stable field,
since that could take any value.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Cleanup...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If we need to update the cached attributes, then we'd better make
sure that we also layoutcommit first. Otherwise, the server may have stale
attributes.
Prior to this patch, the revalidation code tried to "fix" this problem by
simply disabling attributes that would be affected by the layoutcommit.
That approach breaks nfs_writeback_check_extend(), leading to a file size
corruption.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|