summaryrefslogtreecommitdiffstats
path: root/fs/nfs/inode.c
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Allow concurrent inode revalidationTrond Myklebust2008-10-071-41/+2
| | | | | | | | | Currently, if two processes are both trying to revalidate metadata for the same inode, they will find themselves being serialised. There is no good justification for this now that we have improved our ability to detect stale attribute data, so we should remove that serialisation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix up nfs_setattr_update_inode()Trond Myklebust2008-10-071-1/+1
| | | | | | Ensure that it sets the inode metadata under the correct spinlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Don't clear nfsi->cache_validity in nfs_check_inode_attributes()Trond Myklebust2008-10-071-7/+0
| | | | | | | | | If we're merely checking the inode attributes because we suspect that the 'updated' attributes returned by the RPC call are stale, then we shouldn't be doing weak cache consistency updates or clearing the cache_validity flags. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Convert __nfs_revalidate_inode() to use nfs_refresh_inode()Trond Myklebust2008-10-071-4/+1
| | | | | | | | | | | | | In the case where there are parallel RPC calls to the same inode, we may receive stale metadata due to the lack of ordering, hence the sanity checking of metadata in nfs_refresh_inode(). Currently, __nfs_revalidate_inode() is calling nfs_update_inode() directly, without any further sanity checks, and hence may end up setting the inode up with stale metadata. Fix is to use nfs_refresh_inode() instead of nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix nfs_post_op_update_inode_force_wcc()Trond Myklebust2008-10-071-8/+27
| | | | | | | | | | If we believe that the attributes are old (see nfs_refresh_inode()), then we shouldn't force an update. Also ensure that we hold the inode->i_lock across attribute checks and the call to nfs_refresh_inode_locked() to ensure that we don't race with other attribute updates. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix the NFS attribute updateTrond Myklebust2008-10-071-3/+41
| | | | | | | | | | | | | | | | Currently nfs_refresh_inode() will only update the inode metadata if it sees that the RPC call that returned the nfs_fattr was started after the last update of the inode. This means that if we have parallel RPC calls to the same inode (when sending WRITE calls, for instance), we may often miss updates. This patch attempts to recover those missed updates by also accepting them if the ctime in the nfs_fattr is more recent than the inode's cached ctime. It also recovers the case where the file size has increased, but the ctime has not been updated due to limited ctime resolution. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Clean up nfs_refresh_inode() and nfs_post_op_update_inode()Trond Myklebust2008-10-071-7/+14
| | | | | | | | | Try to avoid taking and dropping the inode->i_lock more than once. Do so by moving the code in nfs_refresh_inode() that needs to be done under the spinlock into a function nfs_refresh_inode_locked(), and then having both nfs_refresh_inode() and nfs_post_op_update_inode() call it directly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SL*B: drop kmem cache argument from constructorAlexey Dobriyan2008-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFS: Remove attribute update related BKL referencesTrond Myklebust2008-07-151-4/+0
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Remove BKL requirement from attribute updatesTrond Myklebust2008-07-151-6/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main problem is dealing with inode->i_size: we need to set the inode->i_lock on all attribute updates, and so vmtruncate won't cut it. Make an NFS-private version of vmtruncate that has the necessary locking semantics. The result should be that the following inode attribute updates are protected by inode->i_lock nfsi->cache_validity nfsi->read_cache_jiffies nfsi->attrtimeo nfsi->attrtimeo_timestamp nfsi->change_attr nfsi->last_updated nfsi->cache_change_attribute nfsi->access_cache nfsi->access_cache_entry_lru nfsi->access_cache_inode_lru nfsi->acl_access nfsi->acl_default nfsi->nfs_page_tree nfsi->ncommit nfsi->npages nfsi->open_files nfsi->silly_list nfsi->acl nfsi->open_states inode->i_size inode->i_atime inode->i_mtime inode->i_ctime inode->i_nlink inode->i_uid inode->i_gid The following is protected by dir->i_mutex nfsi->cookieverf Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Ensure we zap only the access and acl caches when setting new aclsTrond Myklebust2008-07-091-3/+1
| | | | | | | ...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the acls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix the ftruncate() credential problemTrond Myklebust2008-07-091-2/+2
| | | | | | | ftruncate() access checking is supposed to be performed at open() time, just like reads and writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs: path_{get,put}() cleanupsJan Blunck2008-05-161-2/+1
| | | | | | | | | | | Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* nfs: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-05-161-2/+2
| | | | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Ensure that 'noac' and/or 'actimeo=0' turn off attribute cachingTrond Myklebust2008-05-161-0/+7
| | | | | | | | | | | | | | Both the 'noac' and 'actimeo=0' mount options should ensure that attributes are not cached, however a bug in nfs_attribute_timeout() means that currently, the attributes may in fact get cached for up to one jiffy. This has been seen to cause corruption in some applications. The reason for the bug is that the time_in_range() test returns 'true' as long as the current time lies between nfsi->read_cache_jiffies and nfsi->read_cache_jiffies + nfsi->attrtimeo. In other words, if jiffies equals nfsi->read_cache_jiffies, then we still cache the attribute data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Merge branch 'devel'Trond Myklebust2008-04-241-2/+43
|\
| * SUNRPC: Add a helper rpcauth_lookup_generic_cred()Trond Myklebust2008-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFSv4 protocol allows clients to negotiate security protocols on the fly in the case where an administrator on the server changes the export settings and/or in the case where we may have a filesystem migration event. Instead of having the NFS client code cache credentials that are tied to a particular AUTH method it is therefore preferable to have a generic credential that can be converted into whatever AUTH is in use by the RPC client when the read/write/sillyrename/... is put on the wire. We do this by means of the new "generic" credential, which basically just caches the minimal information that is needed to look up an RPCSEC_GSS, AUTH_SYS, or AUTH_NULL credential. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * Merge commit 'origin' into develTrond Myklebust2008-03-081-2/+4
| |\
| * | NFS: Add an nfsiod workqueueTrond Myklebust2008-02-251-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS post-rpciod cleanups often involve tasks that cannot be safely performed within the rpciod context (due to deadlock concerns). We therefore add a dedicated NFS workqueue that can perform tasks like cleaning up state after an interrupted NFSv4 open() call, or calling put_nfs_open_context() after an asynchronous read or write call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Fix a deadlock with lazy umountTrond Myklebust2008-02-251-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare() and task->tk_ops->rpc_call_done() to call mntput() in any way, since that will cause a deadlock when the call to rpc_shutdown_client() attempts to wait on 'task' to complete. We can avoid the above deadlock by moving calls to mntput to task->tk_ops->rpc_release() callback, since at that time the task will be marked as completed, and so rpc_shutdown_client won't attempt to wait on it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | NFS: initialize flags field in nfs_open_contextJeff Layton2008-04-081-0/+1
| |/ |/| | | | | | | | | | | | | | | | | The nfs_open_context struct had a "flags" field added recently, but the allocator isn't initializing it. It also looks like the allocator isn't initializing the mode or list either, but they seem to be overwritten by the caller, so that's less of an issue. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: Fix the fsid revalidation in nfs_update_inode()Trond Myklebust2008-03-071-2/+4
|/ | | | | | | | | When we detect that we've crossed a mountpoint on the remote server, we must take care not to use that inode to revalidate the fsid on our current superblock. To do so, we label the inode as a remote mountpoint, and check for that in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* Merge branch 'task_killable' of ↵Linus Torvalds2008-02-011-5/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits) Remove commented-out code copied from NFS NFS: Switch from intr mount option to TASK_KILLABLE Add wait_for_completion_killable Add wait_event_killable Add schedule_timeout_killable Use mutex_lock_killable in vfs_readdir Add mutex_lock_killable Use lock_page_killable Add lock_page_killable Add fatal_signal_pending Add TASK_WAKEKILL exit: Use task_is_* signal: Use task_is_* sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL ptrace: Use task_is_* power: Use task_is_* wait: Use TASK_NORMAL proc/base.c: Use task_is_* proc/array.c: Use TASK_REPORT perfmon: Use task_is_* ... Fixed up conflicts in NFS/sunrpc manually..
| * NFS: Switch from intr mount option to TASK_KILLABLEMatthew Wilcox2007-12-061-5/+1
| | | | | | | | | | | | | | | | | | By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr' mount option. We have to use _killable everywhere instead of _interruptible as we get rid of rpc_clnt_sigmask/sigunmask. Signed-off-by: Liam R. Howlett <howlett@gmail.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* | NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inodeTrond Myklebust2008-01-301-1/+1
| | | | | | | | | | | | | | | | Otherwise, there is a potential deadlock if the last dput() from an NFSv4 close() or other asynchronous operation leads to nfs_clear_inode calling the synchronous delegreturn. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | nfs: convert NFS_*(inode) helpers to static inlineBenny Halevy2008-01-301-1/+1
| | | | | | | | | | Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | nfs: obliterate NFS_FLAGS macroBenny Halevy2008-01-301-3/+3
| | | | | | | | | | | | | | use NFS_I(inode)->flags instead Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: define a function to update nfsi->cache_change_attributeTrond Myklebust2008-01-301-2/+4
| | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: Prevent nfs_getattr() hang during heavy write workloadsChuck Lever2008-01-301-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX requires that ctime and mtime, as reported by the stat(2) call, reflect the activity of the most recent write(2). To that end, nfs_getattr() flushes pending dirty writes to a file before doing a GETATTR to allow the NFS server to set the file's size, ctime, and mtime properly. However, nfs_getattr() can be starved when a constant stream of application writes to a file prevents nfs_wb_nocommit() from completing. This usually results in hangs of programs doing a stat against an NFS file that is being written. "ls -l" is a common victim of this behavior. To prevent starvation, hold the file's i_mutex in nfs_getattr() to freeze applications writes temporarily so the client can more quickly obtain clean values for a file's size, mtime, and ctime. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | NFS: Ensure nfs_wcc_update_inode always converts file size to loff_tChuck Lever2008-01-301-2/+3
|/ | | | | | | | | | | | | | The nfs_wcc_update_inode() function omits logic to convert the type of the NFS on-the-wire value of a file's size (__u64) to the type of file size value stored in struct inode (loff_t, which is signed). Everywhere else in the NFS client I checked already correctly converts the file size type. This effects only very large files. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Ensure that we wait for the CLOSE request to completeTrond Myklebust2007-10-191-4/+18
| | | | | | | | | | Otherwise, we do end up breaking close-to-open semantics. We also end up breaking some of the silly-rename tests in Connectathon on some setups. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix a race in sillyrenameTrond Myklebust2007-10-191-0/+3
| | | | | | | | | | | | | | | | | lookup() and sillyrename() can race one another because the sillyrename() completion cannot take the parent directory's inode->i_mutex since the latter may be held by whoever is calling dput(). We therefore have little option but to add extra locking to ensure that nfs_lookup() and nfs_atomic_open() do not race with the sillyrename completion. If somebody has looked up the sillyrenamed file in the meantime, we just transfer the sillydelete information to the new dentry. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: if ATTR_KILL_S*ID bits are set, then skip mode changeJeff Layton2007-10-181-0/+4
| | | | | | | | | | | | | If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing the setuid/setgid bits. For NFS, skip the mode change and let the server handle it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter2007-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* NFS: Add a boot parameter to disable 64 bit inode numbersTrond Myklebust2007-10-091-1/+26
| | | | | | | | This boot parameter will allow legacy 32-bit applications which call stat() to continue to function even if the NFSv3/v4 server uses 64-bit inode numbers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: nfs_refresh_inode should clear cache_validity flags on successTrond Myklebust2007-10-091-18/+17
| | | | | | | If the cached attributes match the ones supplied in the fattr, then assume we've revalidated the inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix a connectathon regression in NFSv3 and NFSv4Trond Myklebust2007-10-091-1/+9
| | | | | | | | | We're failing basic test6 against Linux servers because they lack a correct change attribute. The fix is to assume that we always want to invalidate the readdir caches when we call update_changeattr and/or nfs_post_op_update_inode on a directory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Get rid of some obsolete macrosTrond Myklebust2007-10-091-2/+2
| | | | | | | - NFS_READTIME, NFS_CHANGE_ATTR are completely unused. - Inline the few remaining uses of NFS_ATTRTIMEO, and remove. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Reset nfsi->last_updated only if the attribute changedTrond Myklebust2007-10-091-5/+12
| | | | | | | Otherwise set it to nfsi->read_cache_jiffies in order to prevent jiffy wraparound issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Remove nfs_begin_data_update/nfs_end_data_updateTrond Myklebust2007-10-091-19/+0
| | | | | | | The lower level routines in fs/nfs/proc.c, fs/nfs/nfs3proc.c and fs/nfs/nfs4proc.c should already be dealing with the revalidation issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Remove NFS_I(inode)->data_updatesTrond Myklebust2007-10-091-20/+1
| | | | | | We have no more users... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: nfs_post_op_update_inode don't update cache_change_attributeTrond Myklebust2007-10-091-11/+7
| | | | | | | | If nfs_post_op_update_inode fails because the server didn't return any attributes, then we let the subsequent inode revalidation update cache_change_attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Don't revalidate dentries on directory size or ctime changesTrond Myklebust2007-10-091-4/+1
| | | | | | We only need to look at the mtime changes... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Don't set cache_change_attribute in nfs_revalidate_mappingTrond Myklebust2007-10-091-4/+1
| | | | | | | The attribute revalidation code will already have taken care of resetting nfsi->cache_change_attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fake up 'wcc' attributes to prevent cache invalidation after writeTrond Myklebust2007-10-091-0/+34
| | | | | | | | | NFSv2 and v4 don't offer weak cache consistency attributes on WRITE calls. In NFSv3, returning wcc data is optional. In all cases, we want to prevent the client from invalidating our cached data whenever ->write_done() attempts to update the inode attributes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Remove bogus check of cache_change_attribute in nfs_update_inodeTrond Myklebust2007-10-091-12/+3
| | | | | | | | Remove the bogus 'data_stable' check in nfs_update_inode. The cache_change_attribute tells you if the directory changed on the server, and should have nothing to do with the file length. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix the ESTALE "revalidation" in _nfs_revalidate_inode()Trond Myklebust2007-10-091-10/+4
| | | | | | | | | | | | | For one thing, the test NFS_ATTRTIMEO() == 0 makes no sense: we're testing whether or not the cache timeout length is zero, which is totally unrelated to the issue of whether or not we trust the file staleness. Secondly, we do not want to retry the GETATTR once a file has been declared stale by the server: we rather want to discard that inode as soon as possible, since there are broken servers still in use out there that reuse filehandles on new files. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix atime revalidation in readdir()Trond Myklebust2007-10-091-0/+7
| | | | | | | | NFSv3 will correctly update atime on a readdir call, so there is no need to set the NFS_INO_INVALID_ATIME flag unless the call to nfs_refresh_inode() fails. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Don't use ctime/mtime for determining when to invalidate the cachesTrond Myklebust2007-10-091-23/+24
| | | | | | In NFSv4 we should only be looking at the change attribute. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Don't force a dcache revalidation if nfs_wcc_update_inode succeedsTrond Myklebust2007-10-091-8/+3
| | | | | | | | The reason is that if the weak cache consistency update was successful, then we know that our client must be the only one that changed the directory, and we've already updated the dcache to reflect the change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
OpenPOWER on IntegriCloud