summaryrefslogtreecommitdiffstats
path: root/sys/modules/nfsserver
Commit message (Collapse)AuthorAgeFilesLines
* Move the NFS FHA (File Handle Affinity) code from sys/nfsserver token2013-04-171-1/+1
| | | | | | | | sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
* Revamp the old NFS server's File Handle Affinity (FHA) code so thatken2013-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
* Factor out the code shared between NFS client and server into its ownmarius2010-02-161-2/+2
| | | | | | | | module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place. Discussed with: rmacklem
* Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.dfr2009-06-301-2/+2
| | | | Approved by: re
* Remove opt_mac.h generation for various kernel modules that no longerrwatson2009-06-061-1/+0
| | | | | | require it. Submitted by: pjd
* Fix standalone module build by generating opt_kgssapi.h.dfr2008-11-251-0/+1
| | | | Submitted by: n_hibma
* Unbreak NFS.des2008-11-061-2/+2
| | | | Pointy hat to: dfr
* Implement support for RPCSEC_GSS authentication to both the NFS clientdfr2008-11-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
* Let modules use the kernel's opt_*.h files if built along withyar2005-10-141-0/+3
| | | | | | | | | | | | | | the kernel by wrapping all targets for fake opt_*.h files in .if defined(KERNBUILDDIR). Thus, such fake files won't be created at all if modules are built with the kernel. Some modules undergo cleanup like removing unused or unneeded options or .h files, without which they wouldn't build this way or the other. Reviewed by: ru Tested by: no binary changes in modules built alone Tested on: i386 sparc64 amd64
* Permit MAC policies to instrument the access control decisions forrwatson2002-11-041-0/+1
| | | | | | | | | | | system accounting configuration and for nfsd server thread attach. Policies might use this to protect the integrity or confidentiality of accounting data, limit the ability to turn on or off accounting, as well as to prevent inappropriately labeled threads from becoming nfs server threads. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Add IPv6 support.alfred2002-07-151-0/+7
| | | | Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
* Drop <bsd.man.mk> support from <bsd.kmod.mk>.ru2002-01-111-2/+0
| | | | Not objected to by: -current
* Cleanup and split of nfs client and server code.peter2001-09-181-12/+6
| | | | This builds on the top of several repo-copies.
* NFS module now requires nfs_lock.calfred2001-04-181-1/+1
|
* Zap some bad examples:peter2001-02-041-2/+1
| | | | | | opt_foo.h: touch opt_foo.h .. is unnecessary - kmod.mk does this for us.
* Use a consistent style and one much closer to the rest of /usr/srcobrien2001-01-061-0/+1
|
* Use .include <bsd.kmod.mk> to get to ../../*/conf/kmod.mk instead ofpeter2000-05-271-1/+1
| | | | encoding the relative path.
* Pull in sys/conf/kmod.mk, rather than /usr/share/mk/bsd.kmod.mk.peter2000-05-041-1/+1
| | | | | | | This means that the kernel can be totally self contained now and is not dependent on the last buildworld to update /usr/share/mk. This might also make it easier to build 5.x kernels on 4.0 boxes etc, assuming gensetdefs and config(8) are updated.
* Remove a whole bunch of "CFLAGS+= -DFSNAME" cruft. It hasn't beenpeter1999-12-121-1/+0
| | | | | | needed for ages, but keeps getting cut/pasted into new Makefiles. (Once apon a time it was used to activate mount arguments in <sys/mount.h>, but that was killed with extreme prejudice long ago)
* Bring these more into line with other modules that have .h files generatedpeter1999-12-121-2/+2
| | | | on the fly.
* Removed special rules for building and cleaning device interface filesbde1999-11-281-4/+0
| | | | | and empty options files. The rules are now generated automatically in bsd.kmod.mk. Cleaned up related things ($S and ${CLEANFILES}).
* Unbreak this build.green1999-11-021-3/+3
|
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Sample initial set of kld-ified modules. Not all have been completelypeter1998-10-161-6/+6
| | | | | | | converted yet. These are more of a starting point. This is NOT connected to the parent Makefile. OK'ed by jkh (who is ever so patiently waiting)
* Finished previous fix - don't forget to add one dummy options headerbde1998-07-071-13/+10
| | | | | | to CLEANFILES. Fixed lots of style bugs.
* Fix the N'th occurance of missed bits due to opt_???? mucking.sos1998-07-021-2/+5
| | | | | | Doesn't anybody TEST code before committing.... This is the N+1'th time these laste couble of days...
* add new opt_nfs.h to cleanfiles...jmg1998-06-301-2/+2
|
* fix buildworld hopefully be3fore anyone complains...jmg1998-06-301-1/+3
| | | | | | | | NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
* Back out opt_diagnostic.h changes.eivind1998-02-061-6/+3
|
* Make the LKMs handle DIAGNOSTIC as a new-style option.eivind1998-02-041-3/+6
|
* Minor fixups after INET option change.eivind1998-01-091-3/+13
|
* Revert $FreeBSD$ back to $Id$peter1997-02-221-1/+1
|
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* Create NFS LKM.wollman1994-09-221-0/+11
OpenPOWER on IntegriCloud