summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_socket.c
Commit message (Collapse)AuthorAgeFilesLines
* In the current world order, solisten() implements the state transition ofrwatson2005-02-211-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn
* In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING,rwatson2005-02-201-1/+2
| | | | | | | | only call the protocol's pru_rcvd() if the protocol has the flag PR_WANTRCVD set. This brings that instance of pru_rcvd() into line with the rest, which do check the flag. MFC after: 3 days
* Correct a typo in the comment describing soreceive_rcvoob().rwatson2005-02-181-1/+1
| | | | MFC after: 3 days
* In soconnect(), when resetting so->so_error, the socket lock is notrwatson2005-02-181-2/+0
| | | | | required due to a straight integer write in which minor races are not a problem.
* Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, whererwatson2005-02-181-126/+0
| | | | | | the rest of the accept filter code currently lives. MFC after: 3 days
* Re-order checks in socheckuid() so that we check all deny cases beforerwatson2005-02-181-3/+3
| | | | | | returning accept. MFC after: 3 days
* In solisten(), unconditionally set the SO_ACCEPTCONN option inrwatson2005-02-181-6/+4
| | | | | | | | | | | | | | | | | | | | so->so_options when solisten() will succeed, rather than setting it conditionally based on there not being queued sockets in the completed socket queue. Otherwise, if the protocol exposes new sockets via the completed queue before solisten() completes, the listen() system call will succeed, but the socket and protocol state will be out of sync. For TCP, this didn't happen in practice, as the TCP code will panic if a new connection comes in after the tcpcb has been transitioned to a listening state but the socket doesn't have SO_ACCEPTCONN set. This is historical behavior resulting from bitrot since 4.3BSD, in which that line of code was associated with the conditional NULL'ing of the connection queue pointers (one-time initialization to be performed during the transition to a listening socket), which are now initialized separately. Discussed with: fenner, gnn MFC after: 3 days
* - Convert so_qlen, so_incqlen, so_qlimit fields of struct socket fromglebius2005-01-241-2/+23
| | | | | | | | | | | short to unsigned short. - Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX. Before this change setting somaxconn to smth above 32767 and calling listen(fd, -1) lead to a socket, which doesn't accept connections at all. Reviewed by: rwatson Reported by: Igor Sysoev
* When re-connecting already connected datagram socket ensure to cleansobomax2005-01-121-2/+11
| | | | | | | | | | up its pending error state, which may be set in some rare conditions resulting in connect() syscall returning that bogus error and making application believe that attempt to change association has failed, while it has not in fact. There is sockets/reconnect regression test which excersises this bug. MFC after: 2 weeks
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Remove an XXXRW indicating atomic operations might be used as arwatson2004-12-231-12/+4
| | | | | | | | | | | | | | | | substitute for a global mutex protecting the socket count and generation number. The observation that soreceive_rcvoob() can't return an mbuf chain is a property, not a bug, so remove the XXXRW. In sorflush, s/existing/previous/ for code when describing prior behavior. For SO_LINGER socket option retrieval, remove an XXXRW about why we hold the mutex: this is correct and not dubious. MFC after: 2 weeks
* In soalloc(), simplify the mac_init_socket() handling to removerwatson2004-12-231-14/+3
| | | | | | | | | | | unnecessary use of a global variable and simplify the return case. While here, use ()'s around return values. In sodealloc(), remove a comment about why we bump the gencnt and decrement the socket count separately. It doesn't add substantially to the reading, and clutters the function. MFC after: 2 weeks
* Remove unneeded code from the zero-copy receive path.alc2004-12-101-12/+0
| | | | | Discussed with: gallatin@ Tested by: ken@
* Tidy up the zero-copy receive path: Remove an unneeded argument toalc2004-12-081-3/+2
| | | | uiomoveco() and userspaceco().
* If soreceive() is called from a socket callback, there's no reasonps2004-11-291-1/+7
| | | | | | | | | to do a window update to the peer (thru an ACK) from soreceive() itself. TCP will do that upon return from the socket callback. Sending a window update from soreceive() results in a lock reversal. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
* Make soreceive(MSG_DONTWAIT) nonblocking. If MSG_DONTWAIT is passed intops2004-11-291-3/+21
| | | | | | | | soreceive(), then pass in M_DONTWAIT to m_copym(). Also fix up error handling for the case where m_copym() returns failure. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
* Since sb_timeo type was increased to int, use INT_MAX instead of SHRT_MAX.glebius2004-11-091-3/+3
| | | | | | | | This also gives us ability to close PR. PR: kern/42352 Approved by: julian (mentor) MFC after: 1 week
* Acquire the accept mutex in soabort() before calling sotryfree(), asrwatson2004-11-021-0/+1
| | | | | | | | that is now required. RELENG_5_3 candidate. Foot provided by: Dikshie <dikshie at ppk dot itb dot ac dot id>
* socreate() does an early abort if either the protocol cannot be found,andre2004-10-231-1/+2
| | | | | | | | | | | | or pru_attach is NULL. With loadable protocols the SPACER dummy protocols have valid function pointers for all methods to functions returning just EOPNOTSUPP. Thus the early abort check would not detect immediately that attach is not supported for this protocol. Instead it would correctly get the EOPNOTSUPP error later on when it calls the protocol specific attach function. Add testing against the pru_attach_notsupp() function pointer to the early abort check as well.
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Rework sofree() logic to take into account a possible race with accept().rwatson2004-10-111-5/+19
| | | | | | | | | | | | | | | | | | | | | | Sockets in the listen queues have reference counts of 0, so if the protocol decides to disconnect the pcb and try to free the socket, this triggered a race with accept() wherein accept() would bump the reference count before sofree() had removed the socket from the listen queues, resulting in a panic in sofree() when it discovered it was freeing a referenced socket. This might happen if a RST came in prior to accept() on a TCP connection. The fix is two-fold: to expand the coverage of the accept mutex earlier in sofree() to prevent accept() from grabbing the socket after the "is it really safe to free" tests, and to expand the logic of the "is it really safe to free" tests to check that the refcount is still 0 (i.e., we didn't race). RELENG_5 candidate. Much discussion with and work by: green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Expand the scope of the socket buffer locks in sopoll() to include therwatson2004-09-051-4/+4
| | | | | | | | | | | | | | | | state test as well as set, or we risk a race between a socket wakeup and registering for select() or poll() on the socket. This does increase the cost of the poll operation, but can probably be optimized some in the future. This appears to correct poll() "wedges" experienced with X11 on SMP systems with highly interactive applications, and might affect a plethora of other select() driven applications. RELENG_5 candidate. Problem reported by: Maxim Maximov <mcsi at mcsi dot pp dot ru> Debugged with help of: dwhite
* Conditional acquisition of socket buffer mutexes when testing socketrwatson2004-08-241-35/+16
| | | | | | | | buffers with kqueue filters is no longer required: the kqueue framework will guarantee that the mutex is held on entering the filter, either due to a call from the socket code already holding the mutex, or by explicitly acquiring it. This removes the last of the conditional socket locking.
* Back out uipc_socket.c:1.208, as it incorrectly assumes that allrwatson2004-08-201-3/+1
| | | | | | | | | | | | sockets are connection-oriented for the purposes of kqueue registration. Since UDP sockets aren't connection-oriented, this appeared to break a great many things, such as RPC-based applications and services (i.e., NFS). Since jmg isn't around I'm backing this out before too many more feet are shot, but intend to investigate the right solution with him once he's available. Apologies to: jmg Discussed with: imp, scottl
* make sure that the socket is either accepting connections or is connectedjmg2004-08-201-1/+3
| | | | | | when attaching a knote to it... otherwise return EINVAL... Pointed out by: benno
* Add locking to the kqueue subsystem. This also makes the kqueue subsystemjmg2004-08-151-6/+10
| | | | | | | | | | | | | a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
* Replace a reference to splnet() with a reference to locking in a comment.rwatson2004-08-111-1/+1
|
* Do some initial locking on accept filter registration and attach. Whilerwatson2004-07-251-29/+76
| | | | | | here, close some races that existed in the pre-locking world during low memory conditions. This locking isn't perfect, but it's closer than before.
* The recent changes to control message passing broke some thingsdwmalone2004-07-181-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | that get certain types of control messages (ping6 and rtsol are examples). This gets the new code closer to working: 1) Collect control mbufs for processing in the controlp == NULL case, so that they can be freed by externalize. 2) Loop over the list of control mbufs, as the externalize function may not know how to deal with chains. 3) In the case where there is no externalize function, remember to add the control mbuf to the controlp list so that it will be returned. 4) After adding stuff to the controlp list, walk to the end of the list of stuff that was added, incase we added a chain. This code can be further improved, but this is enough to get most things working again. Reviewed by: rwatson
* When entering soclose(), assert that SS_NOFDREF is not already set.rwatson2004-07-161-0/+2
|
* Rename Alfred's kern_setsockopt to so_setsockopt, as this seems adwmalone2004-07-121-1/+1
| | | | | | | | a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt. Approved by: alfred
* Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts.alfred2004-07-121-0/+19
| | | | | | | Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done. Seems to fix IPv6 from: kuriyama
* Use sockbuf_pushsync() to synchronize stack and socket buffer staterwatson2004-07-111-34/+47
| | | | | | | | | | | | | | | | | | | | | in soreceive() after removing an MT_SONAME mbuf from the head of the socket buffer. When processing MT_CONTROL mbufs in soreceive(), first remove all of the MT_CONTROL mbufs from the head of the socket buffer to a local mbuf chain, then feed them into dom_externalize() as a set, which both avoids thrashing the socket buffer lock when handling multiple control mbufs, and also avoids races with other threads acting on the socket buffer when the socket buffer mutex is released to enter the externalize code. Existing races that might occur if the protocol externalize method blocked during processing have also been closed. Now that we synchronize socket buffer and stack state following modifications to the socket buffer, turn the manual synchronization that previously followed control mbuf processing with a set of assertions. This can eventually be removed. The soreceive() code is now substantially more MPSAFE.
* Add sockbuf_pushsync(), an inline function that, following a change torwatson2004-07-111-0/+38
| | | | | | | | | | the head of the mbuf chains in a socket buffer, re-synchronizes the cache pointers used to optimize socket buffer appends. This will be used by soreceive() before dropping socket buffer mutexes to make sure a consistent version of the socket buffer is visible to other threads. While here, update copyright to account for substantial rewrite of much socket code required for fine-grained locking.
* Add additional annotations to soreceive(), documenting the effects ofrwatson2004-07-111-1/+35
| | | | | | | | | | | locking on 'nextrecord' and concerns regarding potentially inconsistent or stale use of socket buffer or stack fields if they aren't carefully synchronized whenever the socket buffer mutex is released. Document that the high-level sblock() prevents races against other readers on the socket. Also document the 'type' logic as to how soreceive() guarantees that it will only return one of normal data or inline out-of-band data.
* In the 'dontblock' section of soreceive(), assert that the mbuf on handrwatson2004-07-111-0/+1
| | | | ('m') is in fact the first mbuf in the receive socket buffer.
* Break out non-inline out-of-band data receive code from soreceive()rwatson2004-07-111-38/+63
| | | | and put it in its own helper function soreceive_rcvoob().
* Assign pointers values of NULL rather than 0 in soreceive().rwatson2004-07-111-2/+2
|
* When the MT_SONAME mbuf is popped off of a receive socket bufferrwatson2004-07-101-0/+2
| | | | | | | associated with a PR_ADDR protocol, make sure to update the m_nextpkt pointer of the new head mbuf on the chain to point to the next record. Otherwise, when we release the socket buffer mutex, the socket buffer mbuf chain may be in an inconsistent state.
* Now socket buffer locks are being asserted at higher code blocks inrwatson2004-07-101-4/+1
| | | | soreceive(), remove some leaf assertions that are redundant.
* Assert socket buffer lock at strategic points between sections of coderwatson2004-07-101-0/+5
| | | | | in soreceive() to confirm we've moved from block to block properly maintaining locking invariants.
* Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT.rwatson2004-07-051-1/+4
| | | | | | A subset of locking changes to soreceive() in the queue for merging. Bumped into by: Willem Jan Withagen <wjw@withagen.nl>
* Add a new global mutex, so_global_mtx, which protects the global variablesrwatson2004-06-271-2/+26
| | | | | | | so_gencnt, numopensockets, and the per-socket field so_gencnt. Annotate this this might be better done with atomic operations. Annotate what accept_mtx protects.
* Replace comment on spl state when calling soabort() with a comment onrwatson2004-06-261-1/+4
| | | | | locking state. No socket locks should be held when calling soabort() as it will call into protocol code that may acquire socket locks.
* Lock socket buffers when processing setting socket options SO_SNDLOWATrwatson2004-06-241-0/+4
| | | | or SO_RCVLOWAT for read-modify-write.
* Slide socket buffer lock earlier in sopoll() to cover the call intorwatson2004-06-241-2/+2
| | | | | selrecord(), setting up select and flagging the socker buffers as SB_SEL and setting up select under the lock.
* Remove spl's from uipc_socket to ease in merging.rwatson2004-06-221-40/+8
|
* Merge next step in socket buffer locking:rwatson2004-06-211-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function. - Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not. - Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return. - Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return. - Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush(). - Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk(). - Assert the socket buffer lock in SBLINKRECORD(). - Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock. - Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked(). - Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names. - Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked(). - Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked(). - sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked(). - sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed. - soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead. Some parts of this change were: Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* When retrieving the SO_LINGER socket option for user space, hold therwatson2004-06-201-0/+7
| | | | | | socket lock over pulling so_options and so_linger out of the socket structure in order to retrieve a consistent snapshot. This may be overkill if user space doesn't require a consistent snapshot.
* Convert an if->panic in soclose() into a call to KASSERT().rwatson2004-06-201-2/+1
|
OpenPOWER on IntegriCloud