summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* In the current world order, solisten() implements the state transition ofrwatson2005-02-211-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn
* When aborting a UNIX domain socket bind() because VOP_CREATE() failed,rwatson2005-02-211-1/+3
| | | | | | make sure to call vn_finished_write(mp) before returning. MFC after: 3 days
* style(9)-ize function headers, remove use of 'register'.rwatson2005-02-201-59/+30
| | | | MFC after: 3 days
* In unp_attach(), allow uma_zalloc to zero the new unpcb rather thanrwatson2005-02-201-3/+2
| | | | | | | | explicitly using bzero(). Update copyright. MFC after: 3 days
* Move assignment of UNIX domain socket pcb during unp_attach() outsiderwatson2005-02-201-1/+1
| | | | | | | of the global UNIX domain socket mutex: no protection is needed that early in the setup of the UNIX domain socket and socket structures. MFC after: 3 days
* /* -> /*- for copyright notices, minor format tweaks as necessaryimp2005-01-061-1/+1
|
* Remove temporary debugging printf that was used to detect the presencerwatson2004-12-231-4/+0
| | | | | | | of a race that had previously caused a panic in order to determine if the fix was for the right problem. It was. MFC after: 2 weeks
* Add send buffer locking to uipc_send(). Without this locking a race canalc2004-12-221-0/+3
| | | | | | | | | occur between a reader and a writer that results in a panic upon close, e.g., "panic: sbflush_locked: cc 4 || mb 0xffffff0052afa400 || mbcnt 0" Reviewed by: rwatson@ MFC after: 2 weeks
* "nfiles" is a bad name for a global variable. Call it "openfiles" insteadphk2004-12-011-2/+2
| | | | as this is more correct and matches the sysctl variable.
* Initialize struct pr_userreqs in new/sparse style and fill in commonphk2004-11-081-5/+18
| | | | | | default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Don't hold the UNIX domain socket subsystem lock over the body of therwatson2004-08-251-8/+15
| | | | | | | | | | | | UNIX domain socket garbage collection implementation, as that risks holding the mutex over potentially sleeping operations (as well as introducing some nasty lock order issues, etc). unp_gc() will hold the lock long enough to do necessary deferal checks and set that it's running, but then release it until it needs to reset the gc state. RELENG_5 candidate. Discussed with: alfred
* Add UNP_UNLOCK_ASSERT() to asser that the UNIX domain socket subsystemrwatson2004-08-191-2/+10
| | | | | | | | | | lock is not held. Rather than annotating that the lock is released after calls to unp_detach() with a comment, annotate with an assertion. Assert that the UNIX domain socket subsystem lock is not held when unp_externalize() and unp_internalize() are called.
* Always acquire the UNIX domain socket subsystem lock (UNP lock)rwatson2004-08-161-46/+107
| | | | | | | | | | before dereferencing sotounpcb() and checking its value, as so_pcb is protected by protocol locking, not subsystem locking. This prevents races during close() by one thread and use of ths socket in another. unp_bind() now assert the UNP lock, and uipc_bind() now acquires the lock around calls to unp_bind().
* Annotate the current UNIX domain socket locking strategies, order,rwatson2004-08-161-0/+21
| | | | | strengths, and weaknesses in a comment. Assert a copyright over the changes made as part of the locking work.
* After completing a name lookup for a target UNIX domain socket torwatson2004-08-141-5/+18
| | | | | | | | | | | | | | | | | | | | | | connect to, re-check that the local UNIX domain socket hasn't been closed while we slept, and if so, return EINVAL. This affects the system running both with and without Giant over the network stack, and recent ULE changes appear to cause it to trigger more frequently than previously under load. While here, improve catching of possibly closed UNIX domain sockets in one or two additional circumstances. I have a much larger set of related changes in Perforce, but they require more testing before they can be merged. One debugging printf is left in place to indicate when such a race takes place: this is typically triggered by a buggy application that simultaenously connect()'s and close()'s a UNIX domain socket file descriptor. I'll remove this at some point in the future, but am interested in seeing how frequently this is reported. In the case of Martin's reported problem, it appears to be a result of a non-thread safe syslog() implementation in the C library, which does not synchronize access to its logging file descriptor. Reported by: mbr
* In uipc_connect(), assert that the passed thread is curthread, and passrwatson2004-07-251-1/+3
| | | | td into unp_connect() instead of reading curthread.
* Drop Giant and acquire the UNIX domain socket subsystem lock a bitrwatson2004-07-181-4/+4
| | | | | | | | | earlier in unp_connect() so that vp->v_socket can't change between our copying its value to a local variable and later use of that variable. This may have been responsible for a panic during shutdown that I experienced where simultaneous closing of a listen socket by rpcbind and a new connection being made to rpcbind by mountd.
* We allocate an array of pointers to the global file table whilealfred2004-07-021-1/+12
| | | | | not holding the filelist_lock. This means the filelist can change size while allocating. Detect this race and retry the allocation.
* Acquire the socket buffer lock when calling unp_scan() onrwatson2004-06-271-0/+2
| | | | | so->so_rcv.sb_mb to prevent the mbuf chain from changing during the scan.
* Reduce the number of unnecessary unlock-relocks on socket buffer mutexesrwatson2004-06-261-6/+3
| | | | | | | | | | | | | | | | | | | | associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.
* Release UNIX domain socket subsystem lock earlier -- don't need torwatson2004-06-251-1/+1
| | | | | hold it over free of unp_addr if we've already removed all references to unp.
* Merge next step in socket buffer locking:rwatson2004-06-211-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function. - Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not. - Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return. - Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return. - Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush(). - Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk(). - Assert the socket buffer lock in SBLINKRECORD(). - Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock. - Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked(). - Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names. - Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked(). - Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked(). - sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked(). - sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed. - soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead. Some parts of this change were: Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* In uipc_rcvd(), lock the socket buffers at either end of the UNIXrwatson2004-06-201-0/+4
| | | | | | | domain sokcet when updating fields at both ends. Submitted by: sam Sponsored by: FreeBSD Foundation
* Hold SOCK_LOCK(so) when frobbing so_state when disconnecting arwatson2004-06-201-1/+5
| | | | connected UNIX domain datagram socket.
* Second half of the dev_t cleanup.phk2004-06-171-1/+1
| | | | | | | | | | | The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
* The socket field so_state is used to hold a variety of socket relatedrwatson2004-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
* Socket MAC labels so_label and so_peerlabel are now protected byrwatson2004-06-131-0/+2
| | | | | | | | | | | | | SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.
* Extend coverage of SOCK_LOCK(so) to include so_count, the socketrwatson2004-06-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* Introduce a subsystem lock around UNIX domain sockets in order to protectrwatson2004-06-101-58/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | global and allocated variables. This strategy is derived from work originally developed by BSDi for BSD/OS, and applied to FreeBSD by Sam Leffler: - Add unp_mtx, a global mutex which will protect all UNIX domain socket related variables, structures, etc. - Add UNP_LOCK(), UNP_UNLOCK(), UNP_LOCK_ASSERT() macros. - Acquire unp_mtx on entering most UNIX domain socket code, drop/re-acquire around calls into VFS, and release it on return. - Avoid performing sodupsockaddr() while holding the mutex, so in general move to allocating storage before acquiring the mutex to copy the data. - Make a stack copy of the xucred rather than copying out while holding unp_mtx. Copy the peer credential out after releasing the mutex. - Add additional assertions of vnode locks following VOP_CREATE(). A few notes: - Use of an sx lock for the file list mutex may cause problems with regard to unp_mtx when garbage collection passed file descriptors. - The locking in unp_pcblist() for sysctl monitoring is correct subject to the unpcb zone not returning memory for reuse by other subsystems (consistent with similar existing concerns). - Sam's version of this change, as with the BSD/OS version, made use of both a global lock and per-unpcb locks. However, in practice, the global lock covered all accesses, so I have simplified out the unpcb locks in the interest of getting this merged faster (reducing the overhead but not sacrificing granularity in most cases). We will want to explore possibilities for improving lock granularity in this code in the future. Submitted by: sam Sponsored by: FreeBSD Foundatiuon Obtained from: BSD/OS 5 snapshot provided by BSDi
* Mark sun_noname as const since it's immutable. Update definitionsrwatson2004-06-041-5/+5
| | | | | of functions that potentially accept &sun_noname (sbappendaddr(), et al) to accept a const sockaddr pointer.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Export uipc_connect2() from uipc_usrreq.c instead of unp_connect2(),rwatson2004-03-311-2/+3
| | | | | | | | | | and consume that interface in portalfs and fifofs instead. In the new world order, unp_connect2() assumes that the unpcb mutex is held, whereas uipc_connect2() validates that the passed sockets are UNIX domain sockets, then grabs the mutex. NB: the portalfs and fifofs code gets down and dirty with UNIX domain sockets. Maybe this is a bad thing.
* Prefer NULL to 0 when testing and assigning pointer values.rwatson2004-03-301-56/+57
|
* Rename dup_sockaddr() to sodupsockaddr() for consistency with otherrwatson2004-03-011-11/+15
| | | | | | | | | | | | functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam
* If we're going to panic(), do it before dereferencing a NULL pointer.cperciva2004-02-221-1/+1
| | | | | Reported by: "Ted Unangst" <tedu@coverity.com> Approved by: rwatson (mentor)
* Restore correct semantics for F_DUPFD fcntl. This should fix the errorsdes2004-01-171-1/+1
| | | | people have been getting with configure scripts.
* New file descriptor allocation code, derived from similar code introduceddes2004-01-151-1/+1
| | | | | | | | | | | in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).
* Mechanical whitespace cleanup; parenthesize return values; other minordes2004-01-111-56/+59
| | | | style nits.
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Use __FBSDID().obrien2003-06-111-1/+3
|
* s/discriptors/descriptors/cognet2003-03-231-1/+1
|
* Back out M_* changes, per decision of the TRB.imp2003-02-191-6/+6
| | | | Approved by: trb
* Do not allow kqueues to be passed via unix domain sockets.alfred2003-02-151-0/+7
|
* Remove vestiges of no longer needed unp_rvnode field.hsu2003-02-061-1/+0
| | | | Approved by: phk (who originally added it in rev 1.8 of unpcb.h)
* Catch more uses of MIN().alfred2003-02-021-4/+0
|
* Remove extraneous FILEDESC_LOCKs around atomic reads.hsu2003-01-241-4/+0
| | | | Reviewed by: jhb
* Added comment why this workaround is required.ume2003-01-221-1/+7
| | | | | Suggested by: sam MFC after: 1 week
* getpeername() returns with no error but didn't fill struct sockaddrume2003-01-221-0/+2
| | | | | | | | | | correctly against PF_LOCAL. It seems that the test always fails then sockaddr was not filled. So, I added else clause for workaround. I doubt if it is right fix. However, it is better than nothing. I found that NetBSD has same potential problem. But, fortunately, NetBSD has equivalent else clause. MFC after: 1 week
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-6/+6
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
OpenPOWER on IntegriCloud