summaryrefslogtreecommitdiffstats
path: root/sys/nfsclient
Commit message (Collapse)AuthorAgeFilesLines
* MFC r275897:kib2015-01-011-2/+1
| | | | | Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP().
* MFC: r259084rmacklem2013-12-301-0/+1
| | | | | | | | | | | | | | | | | | For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases.
* A problem with the old NFS client where large writes to large filesrmacklem2013-07-041-2/+13
| | | | | | | | | | | | | | | | | | | | | | would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 such that the call to the vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption. Reported by: David G. Lawrence (dg@dglawrence.com) Tested by: David G. Lawrence (pending, to happen soon) Reviewed by: kib MFC after: 1 week
* A recent version of the oldnfs NFS client in head/currentrmacklem2013-07-011-2/+2
| | | | | | | | | | will crash when doing a large write, since m_get2() would return NULL. This patch fixes the problem, since nfsm_uiotombuf() will allocate additional mbufs, as required. Reported by: sbruno Tested by: sbruno Discussed with: glebius
* - Convert the bufobj lock to rwlock.jeff2013-05-312-1/+2
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* Add a patch analygous to r248567, r248581, r251079 to thermacklem2013-05-291-1/+8
| | | | | | | | old NFS client to avoid the panic reported in the PR by doing the vnode_pager_setsize() call after unlocking the mutex. PR: 177335 MFC after: 2 weeks
* When an NFS unmount occurs, once vflush() writes the last dirtyrmacklem2013-04-182-1/+17
| | | | | | | | | | | | | | | | buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread. Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks
* Both NFS clients can deadlock when using the "rdirplus" mountrmacklem2013-04-181-2/+10
| | | | | | | | | | | | | | | | | | | option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories. Reported and tested by: jhb Reviewed by: jhb MFC after: 2 weeks
* Fix remainder calculation when biosize is not a power of 2emaste2013-03-191-3/+3
| | | | | | | | In common configurations biosize is a power of two, but is not required to be so. Thanks to markj@ for spotting an additional case beyond my original patch. Reviewed by: rmacklem@
* Revert 195703 and 195821 as this special stop handling in NFS is nowjhb2013-03-133-9/+7
| | | | | implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.
* Functions m_getm2() and m_get2() have different order of arguments,glebius2013-03-122-35/+36
| | | | | | | and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.
* - Use m_get2() instead of nfsm_reqhead().glebius2013-03-124-88/+45
| | | | | | - Use m_get(), m_getcl() instead of historic macros. Sponsored by: Nginx, Inc.
* MFCattilio2013-02-212-3/+1
|
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-202-11/+11
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Rework the handling of stop signals in the NFS client. The changes injhb2013-02-061-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month
* Further cleanups to use of timestamps in NFS:jhb2013-01-251-8/+5
| | | | | | | | | | | | | | | - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
* Use vfs_timestamp() to set file timestamps rather than invokingjhb2013-01-181-3/+3
| | | | | | | getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week
* Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()jhb2013-01-161-2/+2
| | | | | | | | instead of comparing the desired time against the current time as a heuristic. Reviewed by: rmacklem MFC after: 1 week
* - More properly handle interrupted NFS requests on an interruptible mountjhb2013-01-151-4/+12
| | | | | | | | | by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed. Reviewed by: rmacklem MFC after: 1 week
* Move the NFSv4.1 client patches over from projects/nfsv4.1-clientrmacklem2012-12-081-0/+1
| | | | | | | | | | to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
* Mechanically substitute flags from historic mbuf allocator withglebius2012-12-052-9/+9
| | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
* r16312 is not any longer real since many years (likely since when VFSattilio2012-11-191-6/+0
| | | | | | | | | | received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
* Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.attilio2012-11-091-1/+1
| | | | | Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
* Do not leave invalid pages in the object after the short read for akib2012-08-141-4/+8
| | | | | | | | | | | | | | network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Reduce code duplication and exposure of direct access to structkib2012-08-041-30/+2
| | | | | | | | | vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week
* PR# 165923 reported intermittent write failures for dirtyrmacklem2012-05-124-1/+25
| | | | | | | | | | | | | | | | | | | memory mapped pages being written back on an NFS mount. Since any thread can call VOP_PUTPAGES() to write back a dirty page, the credentials of that thread may not have write access to the file on an NFS server. (Often the uid is 0, which may be mapped to "nobody" in the NFS server.) Although there is no completely correct fix for this (NFS servers check access on every write RPC instead of at open/mmap time), this patch avoids the common cases by holding onto a credential that recently opened the file for writing and uses that credential for the write RPCs being done by VOP_PUTPAGES() for both NFS clients. Tested by: Joel Ray Holveck (joelh at juniper.net) PR: kern/165923 Reviewed by: kib MFC after: 2 weeks
* Fix mount mutex handling missed in r234386.mckusick2012-05-101-1/+0
|
* Fix mount mutex handling missed in r234386.pluknet2012-05-051-0/+1
|
* Remove unused thread argument from vtruncbuf().trasz2012-04-231-1/+1
| | | | Reviewed by: kib
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-172-18/+3
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Remove fifo.h. The only used function declaration from the header iskib2012-03-111-2/+0
| | | | | | migrated to sys/vnode.h. Submitted by: gianni
* Post r230394, the Lookup RPC counts for both NFS clients increasedrmacklem2012-03-031-11/+15
| | | | | | | | | | | | | | | | | | | | | significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
* Fix the NFS clients so that they use copyin() instead of bcopy(),rmacklem2012-03-011-1/+16
| | | | | | | | when doing direct I/O. This direct I/O code is not enabled by default. Submitted by: kib (earlier version) Reviewed by: kib MFC after: 1 week
* Adjust the nfs_skip_wcc_data_onerr setting so that it does not blockjhb2012-02-241-5/+7
| | | | | | | post-op attributes for ENOENT errors now that the name caching logic depends on working post-op attributes. MFC after: 2 weeks
* Fix found places where uio_resid is truncated to int.kib2012-02-211-7/+7
| | | | | | | | | Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
* Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:bz2012-02-171-2/+2
| | | | | | | | | | | | Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
* Rename cache_lookup_times() to cache_lookup() and retire the old API andjhb2012-02-061-1/+1
| | | | ABI stub for cache_lookup().
* Current implementations of sync(2) and syncer vnode fsync() VOP useskib2012-02-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* When a "mount -u" switches an NFS mount point from TCP to UDP,rmacklem2012-01-311-0/+14
| | | | | | | | | | any thread doing an I/O RPC with a transfer size greater than NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC. After a discussion on freebsd-fs@, I decided to add a warning message for this case, as suggested by Jeremy Chadwick. Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick) MFC after: 2 weeks
* A problem with respect to data read through the buffer cache for bothrmacklem2012-01-271-9/+5
| | | | | | | | | | | | | | | NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes. Reported by: pjd Reviewed by: kib MFC after: 2 weeks
* Revert r230516, since it doesn't really fix the problem.rmacklem2012-01-261-17/+0
|
* Fix remaining calls to cache_enter() in both NFS clients to providekib2012-01-251-4/+0
| | | | | | | | appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for. Reviewed by: jhb (previous version) MFC after: 2 weeks
* Add a timeout on positive name cache entries in the NFS client. That is,jhb2012-01-253-7/+26
| | | | | | | | | | | we will only trust a positive name cache entry for a specified amount of time before falling back to a LOOKUP RPC, even if the ctime for the file handle matches the cached copy in the name cache entry. The timeout is configured via a new 'nametimeo' mount option and defaults to 60 seconds. It may be set to zero to disable positive name caching entirely. Reviewed by: rmacklem MFC after: 1 week
* If a mount -u is done to either NFS client that switches itrmacklem2012-01-251-0/+17
| | | | | | | | | | | | | | | | from TCP to UDP and the rsize/wsize/readdirsize is greater than NFS_MAXDGRAMDATA, it is possible for a thread doing an I/O RPC to get stuck repeatedly doing retries. This happens because the RPC will use a resize/wsize/readdirsize that won't work for UDP and, as such, it will keep failing indefinitely. This patch returns an error for this case, to avoid the problem. A discussion on freebsd-fs@ seemed to indicate that returning an error was preferable to silently ignoring the "udp"/"mntudp" option. This problem was discovered while investigating a problem reported by pjd@ via email. MFC after: 2 weeks
* Close a race in NFS lookup processing that could result in stale name cachejhb2012-01-204-73/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
* Make sure all intermediate variables holding mount flags (mnt_flag)mckusick2012-01-171-1/+1
| | | | | | | and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
* jwd@ reported a problem via email where the old NFS client wouldrmacklem2011-12-211-6/+56
| | | | | | | | | | | | | | | | | | | | | | | get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one. Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-301-1/+1
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
OpenPOWER on IntegriCloud