summaryrefslogtreecommitdiffstats
path: root/sys/nfsclient/nfs_socket.c
Commit message (Collapse)AuthorAgeFilesLines
* Implement support for RPCSEC_GSS authentication to both the NFS clientdfr2008-11-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
* Document a few sysctls in the NFS client and server code.trhodes2008-11-021-6/+10
| | | | | | Minor style(9) where applicable. Approved by: alfred (slightly older version)
* Improve VFS locking:attilio2008-11-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-2/+2
| | | | MFC after: 3 months
* Move the NFS/RPC code away from lbolt.ed2008-07-221-5/+6
| | | | | | | | | | | | | | The kernel has a special wchan called `lbolt', which is triggered each second. It doesn't seem to be used a lot and it seems pretty redundant, because we can specify a timeout value to the *sleep() routines. In an attempt to eventually remove lbolt, make the NFS/RPC code use a timeout of `hz' when trying to reconnect. Only the TTY code (not MPSAFE TTY) and the VFS syncer seem to use lbolt now. Reviewed by: attilio, jhb Approved by: philip (mentor), alfred, dfr
* Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.ru2008-03-251-2/+2
| | | | | | | | | | Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
* Consolidate the code to generate a new XID for a NFS request into ajhb2008-02-131-8/+1
| | | | | | | | nfs_xid_gen() function instead of duplicating the logic in both nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request(). MFC after: 1 week Submitted by: mohans (a long while ago)
* The previous revision broke the case of reconnecting to a TCP NFS serverjhb2008-01-111-1/+22
| | | | | | | | | | | | | | | via a new socket during an NFS operation as that reconnect takes place in the context of an arbitrary thread with an arbitrary credential. Ideally we would like to use the mount point's credential for the entire process of setting up the socket to connect to the NFS server. Since some of the APIs (sobind(), etc.) only take a thread pointer and infer the credential from that instead of a direct credential, work around the problem by temporarily changing the current thread's credential to that of the mount point while connecting the socket and then reverting back to the original credential when we are done. Reviewed by: rwatson Tested on: UDP, TCP, TCP with forced reconnect
* Pass curthread to various socket routines (socreate(), sobind(), andjhb2008-01-101-1/+1
| | | | | | | | | soconnect()) instead of &thread0 when establishing a connection to the NFS server. Otherwise inconsistent credentials may be used when setting up the NFS socket. MFC after: 1 week Reviewed by: rwatson
* NFS MP scaling changes.mohans2007-10-121-71/+123
| | | | | | | | | | | | | | - Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders thru the sndlock. - Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add logic to wait for pending request senders to finish sending before reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec. - Break out the nfs xid manipulation under a new nfs xid lock, rather than over loading the nfs request lock for this purpose. - Fix some of the locking in nfs_request. Many thanks to Kris Kennaway for his help with this and for initiating the MP scaling analysis and work. Kris also tested this patch thorougly. Approved by: re@ (Ken Smith)
* Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, whichrwatson2007-08-061-42/+14
| | | | | | | | | | | | | | | previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
* In nfs_down(), if rep can be NULL, which we test for, then we shouldrwatson2007-05-181-3/+4
| | | | | | | | | lock and unlock conditionally, not just set the flag on it conditionally. In practice, this bug couldn't manifest, as in the current revision of the code, no callers pass a NULL rep. CID: 1416 Found with: Coverity Prevent(tm)
* Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(mohans2007-03-091-1/+1
|
* Over NFS, an open() call could result in multiple over-the-wiremohans2007-03-091-1/+1
| | | | | | | | | | | | GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
* Backing out an earlier change. It seems harmless for NFS to miss the "forcemohans2007-02-161-6/+0
| | | | | unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.
* Add missing MNT_ILOCK around some mnt_kern_flag accesses.mohans2007-02-111-0/+6
|
* NetApp filers return corrupt post op attrs in the wcc on NFS error responses.mohans2006-12-111-1/+8
| | | | | | | This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway
* bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don'tmohans2006-11-271-6/+4
| | | | | | have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().
* 1) Fix up locking in nfs_up() and nfs_down.mohans2006-11-201-30/+39
| | | | | | | | | | | | | | | 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway
* Make EWOULDBLOCK a recoverable error so that the request is retransmitted.mohans2006-10-311-2/+2
| | | | | | | This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@
* Fix for a deadlock triggered by a 'umount -f' causing a NFS request to nevermohans2006-08-291-2/+14
| | | | | | retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway
* soreceive_generic(), and sopoll_generic(). Add new functions sosend(),rwatson2006-07-241-11/+6
| | | | | | | | | | | | | | | | soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
* Signals may be delivered to process as well as to the thread. Check thekib2006-07-081-1/+3
| | | | | | | | thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
* Refactor the NFS over UDP retransmit timeout estimation logic to allowcel2006-05-231-60/+131
| | | | | | | | | | | | | the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated
* Changes to make the NFS client MP safe.mohans2006-05-191-171/+272
| | | | Thanks to Kris Kennaway for testing and sending lots of bugs my way.
* Fix a snafu caused while patching the previous fix from another branch.mohans2006-05-051-1/+0
|
* Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to getmohans2006-05-051-0/+31
| | | | | out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.
* Fix a bug in the NFS/TCP retransmission path.kris2006-03-231-0/+1
| | | | | | | | | | | | | | | | | The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans
* If an NFS server returns more than a few EJUKEBOX errors for a given RPCcel2006-03-171-8/+4
| | | | | | | | | | | | | | | | request, the FreeBSD NFS client will quickly back off to a excessively long wait (days, then weeks) before retrying the request. Change the behavior of the FreeBSD NFS client to match the behavior of the reference NFS client implementation (Solaris). This provides a fixed delay of 10 seconds between each retry by default. A sysctl, called nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris, the sysctl value on FreeBSD is in seconds, rather than in HZ. Sponsored by: Network Appliance, Incorporated Reviewed by: rick Approved by: silby MFC after: 3 days
* Don't log an error on tcp connection reset, even if we don't get ECONNRESET.rees2006-01-201-2/+2
| | | | Submitted by: cel@citi.umich.edu
* Improve upon rev 1.133 where NFS/TCP would not reconnect.ps2005-12-121-13/+2
| | | | Submitted by: Mohan Srinivasan
* Fix for a bug where NFS/TCP would not reconnect (in the case whereps2005-11-211-1/+12
| | | | | | | the server FIN'ed). Seen with Solaris NFS servers. Reported by: TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp> Submitted by: Mohan Strinivasan
* fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.rees2005-11-211-3/+8
| | | | | | | Submitted by: cel@citi.umich.edu Fixed by: rick@snowhite.cis.uoguelph.ca Approved by: alfred MFC after: 3 weeks
* Fix for a race between the thread transmitting the request and theps2005-11-031-1/+5
| | | | | | thread processing the reply. Submitted by: Mohan Srinivasan
* Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),rwatson2005-09-191-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
* FIx for a bug in the change that made nfs_timer() MPSAFE. We need tops2005-07-271-0/+2
| | | | | | | grab Giant before calling pru_send() (if running with mpsafenet = 0). Found by: Jeremie Le Hen. Fixed by: Maxime Henrion
* Make nfs_timer() MPSAFE. With this change, the bottom half of the NFSps2005-07-191-11/+21
| | | | | | | client (the interface with the protocol stack and callouts) is Giant-free. Submitted by: Mohan Srinivasan.
* Fix for a NFS soft mounts bug where if the number of retries exceedsps2005-07-181-1/+2
| | | | | | | | the max rexmits, the request was not being bounced back with a ETIMEDOUT error. Reported by: Oliver Lehmann Submitted by: Mohan Srinivasan
* Fixes for NFS crashes on architectures that require strict alignment.ps2005-07-141-1/+2
| | | | | | | | | | - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32). Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan
* set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests getrees2005-05-101-12/+6
| | | | | | | | | | | re-sent instead of timing out. don't log an error message on reconnection, which is not an error. remove unused nfs_mrep_before_tsleep. Reviewed by: Mohan Srinivasan Approved by: alfred
* Fix a bug in NFS/TCP where retransmissions would not reliably happenps2005-05-041-3/+11
| | | | | | | if the server rebooted or tore down the connection for any reason. Found by: Jonathan Noack. Submitted by: Mohan Srinivasan.
* TCP reconnect is not an error.rees2005-04-181-3/+3
| | | | | | Change the message from LOG_ERR to LOG_INFO. Approved by: alfred
* - The NFS client was incorrectly masking SIGSTOP (which isps2005-03-231-19/+6
| | | | | | | | | | | non-maskable). - The NFS client needs to guard against spurious wakeups while waiting for the response. ltrace causes the process under question to wakeup (possibly from ptrace()), which causes NFS to wakeup from tsleep without the response being delivered. Submitted by: Mohan Srinivasan
* Minor cleanup in nfs_request() and removal of a comment that doesn'tps2005-02-261-10/+1
| | | | | | reflect reality. Submitted by: Mohan Srinivasan
* Fix for a potential NFS client race where shared data is updated fromps2005-02-181-0/+4
| | | | | | base context as well as the socket callback. Submitted by: Mohan Srinivasan
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* If the NFS/TCP stream is out of sync between the client and server,ps2005-01-051-1/+1
| | | | | | | | and if the client (erroneously) reads the RPC length as 0 bytes, the client can loop around in the socket callback. Explicitly check for the length being 0 case and teardown/re-connect. Submitted by: Mohan Srinivasan
* Always issue wakeups() to the NFS requestors under the mutexps2004-12-071-7/+17
| | | | | | to close all potential cases of missed wakeups. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
* Rewrite of the NFS client's reply handling. We now have NFS socketps2004-12-061-401/+530
| | | | | | | | upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
* In nfs_timer(), pass curthread rather than &thread0 into the protocolrwatson2004-08-251-4/+2
| | | | | | | | | | | send routine. In IPv6 UDP, the thread will be passed to suser(), which asserts that if a thread is used for a super user check, it be curthread. Many of these protocol entry points probably need to accept credentials instead of threads. MT5 candidate. Noticed/tested by: kuriyama
OpenPOWER on IntegriCloud