summaryrefslogtreecommitdiffstats
path: root/sys/rpc
Commit message (Collapse)AuthorAgeFilesLines
* MFC r319369:delphij2017-06-063-7/+21
| | | | | | | | | | | | | * limit size of buffers to RPC_MAXDATASIZE * don't leak memory * be more picky about bad parameters From: https://raw.githubusercontent.com/guidovranken/rpcbomb/master/libtirpc_patch.txt https://github.com/guidovranken/rpcbomb/blob/master/rpcbind_patch.txt via NetBSD.
* MFC: r317906rmacklem2017-05-221-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the client side krpc from doing TCP reconnects for ERESTART from sosend(). When sosend() replies ERESTART in the client side krpc, it indicates that the RPC message hasn't yet been sent and that the send queue is full or locked while a signal is posted for the process. Without this patch, this would result in a RPC_CANTSEND reply from clnt_vc_call(), which would cause clnt_reconnect_call() to create a new TCP transport connection. For most NFS servers, this wasn't a serious problem, although it did imply retries of outstanding RPCs, which could possibly have missed the DRC. For an NFSv4.1 mount to AmazonEFS, this caused a serious problem, since AmazonEFS often didn't retain the NFSv4.1 session and would reply with NFS4ERR_BAD_SESSION. This implies to the client a crash/reboot which requires open/lock state recovery. Three options were considered to fix this: - Return the ERESTART all the way up to the system call boundary and then have the system call redone. This is fraught with risk, due to convoluted code paths, asynchronous I/O RPCs etc. cperciva@ worked on this, but it is still a work in prgress and may not be feasible. - Set SB_NOINTR for the socket buffer. This fixes the problem, but makes the sosend() completely non interruptible, which kib@ considered inappropriate. It also would break forced dismount when a thread was blocked in sosend(). - Modify the retry loop in clnt_vc_call(), so that it loops for this case for up to 15sec. Testing showed that the sosend() usually succeeded by the 2nd retry. The extreme case observed was 111 loop iterations, or about 100msec of delay. This third alternative is what is implemented in this patch, since the change is: - localized - straightforward - forced dismount is not broken by it. This patch has been tested by cperciva@ extensively against AmazonEFS.
* MFC: r316694rmacklem2017-04-262-6/+1
| | | | | | | | | | | | | Fix a crash during unmount of an NFSv4.1 mount. Larry Rosenman reported a crash on freebsd-current@ which was caused by a premature release of the krpc backchannel socket structure. I believe this was caused by a race between the SVC_RELEASE() in clnt_vc.c and the xprt_unregister() in the higher layer (clnt_rc.c), which tried to lock the mutex in the xprt structure and crashed. This patch fixes this by removing the xprt_unregister() in the clnt_vc layer and allowing this to always be done by the clnt_rc (higher reconnect layer).
* MFC r313735: add svcpool_close to handle killed nfsd threadsavg2017-02-212-2/+46
| | | | | | | PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem
* MFC r297975:ngie2016-12-031-1/+1
| | | | | | | | | | r297975 (by pfg): RPC: for pointers replace 0 with NULL. These are mostly cosmetical, no functional change. Found with devel/coccinelle.
* MFC r301734:ngie2016-12-031-1/+1
| | | | | | r301734 (by kevlo): Fix the rpcb_getaddr() definition to match its declaration.
* MFstable/11 r303691:ngie2016-08-034-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r302550,r302551,r302552,r302553: r302550: Deobfuscate cleanup path in clnt_dg_create(..) Similar to r300836 and r301800, cl and cu will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. Deobfuscating the cleanup path fixes a leak where if cl was NULL and cu was not, cu would not be free'd, and also removes a duplicate test for cl not being NULL. CID: 1007033, 1007344 r302551: Deobfuscate cleanup path in clnt_vc_create(..) Similar to r300836, r301800, and r302550, cl and ct will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. CID: 1007342 r302552: Convert `svc_xprt_alloc(..)` and `svc_xprt_free(..)`'s prototypes to ANSI C style prototypes r302553: Don't test for xpt not being NULL before calling svc_xprt_free(..) svc_xprt_alloc(..) will always return initialized memory as it uses mem_alloc(..) under the covers, which uses malloc(.., M_WAITOK, ..). CID: 1007341
* MFC r301800:ngie2016-07-081-8/+3
| | | | | | | | | | | | | Deobfuscate cleanup path in clnt_bck_create(..) Similar to r300836, cl and ct will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. Deobfuscating the cleanup path fixes a leak where if cl was NULL and ct was not, ct would not be free'd, and also removes a duplicate test for cl not being NULL. CID: 1229999
* MFC r297391:bdrewery2016-06-271-8/+0
| | | | Remove some NULL checks for M_WAITOK allocations.
* MFC r300836:ngie2016-06-101-12/+10
| | | | | | | | | | | | | | | Quell false positives in svc_vc_create and svc_vc_create_conn with cd and xprt Both cd and xprt will be non-NULL after their respective malloc(9) wrappers are called (mem_alloc and svc_xprt_alloc, which calls mem_alloc) as mem_alloc always gets called with M_WAITOK|M_ZERO today. Thus, testing for them being non-NULL is incorrect -- it misleads Coverity and it misleads the reader. Remove some unnecessary NULL initializations as a follow up to help solidify the fact that these pointers will be initialized properly in sys/rpc/.. with the interfaces the way they are currently. CID: 1007338, 1007339, 1007340
* MFC r300625:ngie2016-06-081-2/+0
| | | | | | | | Remove unnecessary memset(.., 0, ..)'s The mem_alloc macro calls calloc (userspace) / malloc(.., M_WAITOK|M_ZERO) under the covers, so zeroing out memory is already handled by the underlying calls
* MFC r298336:ngie2016-05-131-1/+1
| | | | | | | | | | | r298336 (by cem): kgssapi(4): Fix string overrun in Kerberos principal construction 'buf.value' was previously treated as a nul-terminated string, but only allocated with strlen() space. Rectify this. CID: 1007639
* MFC r297051: Fix incorrect (fortunately bigger) malloc size.mav2016-03-281-1/+1
|
* MFC r291061: Improve locking of sg_threadcount.mav2015-11-271-1/+3
|
* Long-overdue MFC of r280930:wollman2015-10-302-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix overflow bugs in and remove obsolete limit from kernel RPC implementation. The kernel RPC code, which is responsible for the low-level scheduling of incoming NFS requests, contains a throttling mechanism that prevents too much kernel memory from being tied up by NFS requests that are being serviced. When the throttle is engaged, the RPC layer stops servicing incoming NFS sockets, resulting ultimately in backpressure on the clients (if they're using TCP). However, this is a very heavy-handed mechanism as it prevents all clients from making any requests, regardless of how heavy or light they are. (Thus, when engaged, the throttle often prevents clients from even mounting the filesystem.) The throttle mechanism applies specifically to requests that have been received by the RPC layer (from a TCP or UDP socket) and are queued waiting to be serviced by one of the nfsd threads; it does not limit the amount of backlog in the socket buffers. The original implementation limited the total bytes of queued requests to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB. The former limit seems reasonable, since requests queued in the socket buffers and replies being constructed to the requests in progress will all require some amount of network memory, but the 45 MiB limit is plainly ridiculous for modern memory sizes: when running 256 service threads on a busy server, 45 MiB would result in just a single maximum-sized NFS3PROC_WRITE queued per thread before throttling. Removing this limit exposed integer-overflow bugs in the original computation, and related bugs in the routines that actually account for the amount of traffic enqueued for service threads. The old implementation also attempted to reduce accounting overhead by batching updates until each queue is fully drained, but this is prone to livelock, resulting in repeated accumulate-throttle-drain cycles on a busy server. Various data types are changed to long or unsigned long; explicit 64-bit types are not used due to the unavailability of 64-bit atomics on many 32-bit platforms, but those platforms also cannot support nmbclusters large enough to cause overflow. This code (in a 10.1 kernel) is presently running on production NFS servers at CSAIL. Summary of this revision: * Removes 45 MiB limit on requests queued for nfsd service threads * Fixes integer-overflow and signedness bugs * Avoids unnecessary throttling by not deferring accounting for completed requests Differential Revision: https://reviews.freebsd.org/D2165 Reviewed by: rmacklem, mav Relnotes: yes Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory
* MFC 288272jpaetzel2015-10-051-5/+2
| | | | | | | | | Increase group limit for kerberized NFSv4 PR: 202659 Submitted by: matthew.l.dailey@dartmouth.edu Reviewed by: rmacklem dfr Sponsored by: iXsystems
* MFC r286894:delphij2015-09-012-0/+10
| | | | | | Set curvnet context inside the RPC code in more places. Reviewed by: melifaro
* MFC r281199: Remove hard limits on number of accepting NFS connections.mav2015-04-142-3/+3
| | | | | Limits of 5 connections set long ago creates problems for SPEC benchmark. Make the NFS follow system-wide maximum.
* MFC r275745:kib2014-12-271-1/+2
| | | | | | | | | | Add facility to stop all userspace processes. MFC r275753: Fix gcc build. MFC r275820: Add missed break.
* MFC r275618:kib2014-12-151-4/+16
| | | | Check for stop condition in nfsd threads.
* MFC: r268115rmacklem2014-08-014-1/+632
| | | | | | | | Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server.
* MFC r267228:mav2014-06-223-182/+250
| | | | | | | | | | | | | Split RPC pool threads into number of smaller semi-isolated groups. Old design with unified thread pool was good from the point of thread utilization. But single pool-wide mutex became huge congestion point for systems with many CPUs. To reduce the congestion create several thread groups within a pool (one group for every 6 CPUs and 12 threads), each group with own mutex. Each connection during its registration is assigned to one of the groups in round-robin fashion. File affinify code may still move requests between the groups, but otherwise groups are self-contained.
* MFC r267223:mav2014-06-222-6/+1
| | | | Remove st_idle variable, duplicating st_xprt.
* MFC r267221, r267278:mav2014-06-222-78/+56
| | | | | | | Introduce new per-thread lock to protect the list of requests. This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear.
* MFC: r265238, r265240brueffer2014-05-161-7/+5
| | | | | | | Properly free resources in case of error. CID: 1007032 Found with: Coverity Prevent(tm)
* MFC r261449:mav2014-02-071-1/+1
| | | | Fix lock acquisition in case no request space available, missed in r260097.
* MFC r260229, r260258, r260367, r260390, r260459, r260648:mav2014-01-224-22/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
* MFC r260097:mav2014-01-222-53/+54
| | | | | | | Move most of NFS file handle affinity code out of the heavily congested global RPC thread pool lock and protect it with own set of locks. On synthetic benchmarks this improves peak NFS request rate by 40%.
* MFC r260036:mav2014-01-224-11/+25
| | | | | | Introduce xprt_inactive_self() -- variant for use when sure that port is assigned to thread. For example, withing receive handlers. In that case the function reduces to single assignment and can avoid locking.
* MFC r260031:mav2014-01-221-1/+9
| | | | | In addition to r259632 completely block receive upcalls if we have more data than we need. This reduces lock pressure from xprt_active() side.
* MFC r259828:mav2014-01-221-4/+7
| | | | Fix a bug introduced at r259632, triggering infinite loop in some cases.
* MFC r259659, r259662:mav2014-01-222-41/+46
| | | | | | | | | | | | | Remove several linear list traversals per request from RPC server code. Do not insert active ports into pool->sp_active list if they are success- fully assigned to some thread. This makes that list include only ports that really require attention, and so traversal can be reduced to simple taking the first one. Remove idle thread from pool->sp_idlethreads list when assigning some work (port of requests) to it. That again makes possible to replace list traversals with simple taking the first element.
* MFC r259632:mav2014-01-221-128/+115
| | | | | | | | | | | | | | | | | | | | | Rework flow control for connection-oriented (TCP) RPC server. When processing receive buffer, write the amount of data, expected in present request record, into socket's so_rcv.sb_lowat to make stack aware about our needs. When processing following upcalls, ignore them until socket collect enough data to be read and processed in one turn. This change reduces number of context switches and other operations in RPC stack during large NFS writes (especially via non-Jumbo networks) by order of magnitude. After precessing current packet, take another look into the pending buffer to find out whether the next packet had been already received. If not, deactivate this port right there without making RPC code to push this port to another thread just to find that there is nothing. If the next packet is received partially, also deactivate the port, but also update socket's so_rcv.sb_lowat to not be woken up prematurely. This change additionally reduces number of context switches per NFS request about in half.
* MFC r258578, r258580, r258581 (by hrs):mav2014-01-2230-812/+748
| | | | | Replace Sun RPC license in TI-RPC library with a 3-clause BSD license with the explicit permissions.
* MFC r258132:mav2014-01-221-21/+27
| | | | | | | | Some minor tuning to rpc/svc.c: - close cosmetic race in svc_exit(); - do not set wait timeout for idle threads if we have no use for wakeups; - create new requested thread sooner, not only after some another thread wakeup, that may happen later under constant load.
* MFC r259842:dim2013-12-283-8/+1
| | | | | | | | | | | Remove some unused static const strings under sys/rpc, which have never been used since the initial commit (r177633). MFC r259843: Move a static const variable to the #if 0 part where it is only used. (Note the #if 0 part has been inactive since the initial commit, r177633, so maybe it should be removed altogether).
* It was reported via email that the cu_sent field used by thermacklem2013-09-061-0/+2
| | | | | | | | | | | | | | | krpc client side UDP was observed as way out of range and caused the rpc.lockd daemon to hang trying to do an RPC. Inspection of the code found two places where the RPC request is re-queued, but the value of cu_sent was not incremented. Since cu_sent is always decremented when the RPC request is dequeued, I think this could have caused cu_sent to go out of range. This patch adds lines to increment cu_sent for these two cases. Reported by: dwhite@ixsystems.com Discussed with: dwhite@ixsystems.com MFC after: 2 weeks
* Add support for host-based (Kerberos 5 service principal) initiatorrmacklem2013-07-092-18/+127
| | | | | | | | credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb
* Fix a potential socket leak in the NFS server. If a client closes itsjhb2013-04-081-1/+4
| | | | | | | | | | | | | | connection after it was accepted by the userland nfsd process but before it was handled off to svc_vc_create() in the kernel, then svc_vc_create() would see it as a new listen socket and try to listen on it leaving a dangling reference to the socket. Instead, check for disconnected sockets and treat them like a connected socket. The call to pru_getaddr() should fail and cause svc_vc_create() to fail. Note that we need to lock the socket to get a consistent snapshot of so_state since there is a window in soisdisconnected() where both flags are clear. Reviewed by: dfr, rmacklem MFC after: 1 week
* Improve error handling when unwrapping received data.gnn2013-04-041-1/+16
| | | | | Submitted by: Rick Macklem MFC after: 1 week
* Revert 195703 and 195821 as this special stop handling in NFS is nowjhb2013-03-132-5/+3
| | | | | implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.
* Use m_get(), m_gethdr() and m_getcl() instead of historic macros.glebius2013-03-127-15/+8
| | | | Sponsored by: Nginx, Inc.
* Add support for backchannels to the kernel RPC. Backchannelsrmacklem2012-12-086-98/+405
| | | | | | | | | | | | are used by NFSv4.1 for callbacks. A backchannel is a connection established by the client, but used for RPCs done by the server on the client (callbacks). As a result, this patch mixes some client side calls in the server side and vice versa. Some definitions in the .c files were extracted out into a file called krpc.h, so that they could be included in multiple .c files. This code has been in projects/nfsv4.1-client for some time. Although no one has given it a formal review, I believe kib@ has taken a look at it.
* Mechanically substitute flags from historic mbuf allocator withglebius2012-12-058-14/+14
| | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
* Modify the comment to take out the names and URL.rmacklem2012-10-251-6/+3
| | | | | Requested by: kib MFC after: 3 days
* Add a comment describing why r241097 was done.rmacklem2012-10-151-0/+11
| | | | | Suggested by: rwatson MFC after: 1 week
* rpc: convert all uid and gid variables to u_int.pfg2012-10-041-4/+4
| | | | | | | | | | | After further discussion, instead of pretending to use uid_t and gid_t as upstream Solaris and linux try to, we are better using u_int, which is in fact what the code can handle and best approaches the range of values used by uid and gid. Discussed with: bde Reviewed by: bde
* libtirpc: be sure to free cl_netid and cl_tppfg2012-10-021-0/+4
| | | | | | | | | | | | | When creating a client with clnt_tli_create, it uses strdup to copy strings for these fields if nconf is passed in. clnt_dg_destroy frees these strings already. Make sure clnt_vc_destroy frees them in the same way. This change matches the reference (OpenSolaris) implementation. Tested by: David Wolfskill Obtained from: Bull GNU/Linux NFSv4 Project (libtirpc) MFC after: 2 weeks
* RPC: Convert all uid and gid variables of the type uid_t and gid_t.pfg2012-10-021-5/+4
| | | | | | | | This matches what upstream (OpenSolaris) does. Tested by: David Wolfskill Obtained from: Bull GNU/Linux NFSv4 project (libtirpc) MFC after: 3 days
* Attila Bogar and Herbert Poeckl both reported similar problemsrmacklem2012-10-011-3/+4
| | | | | | | | | | | | | | w.r.t. a Linux NFS client doing a krb5 NFS mount against the FreeBSD server. We determined this was a Linux bug: http://www.spinics.net/lists/linux-nfs/msg32466.html, however the mount failed to work, because the Destroy operation with a bogus encrypted checksum destroyed the authenticator handle. This patch changes the rpcsec_gss code so that it doesn't Destroy the authenticator handle for this case and, as such, the Linux mount will work. Tested by: Attila Bogar and Herbert Poeckl MFC after: 2 weeks
OpenPOWER on IntegriCloud