summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_socket.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Annotate some ordering-related issues in solisten() which are not yetrwatson2004-06-201-0/+5
| | | | | | | resolved by socket locking: in particular, that we test the connection state at the socket layer without locking, request that the protocol begin listening, and then set the listen state on the socket non-atomically, resulting in a non-atomic cross-layer test-and-set.
* Assert socket buffer lock in sb_lock() to protect socket buffer sleeprwatson2004-06-191-25/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state. Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument. Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped. Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock. Drop some spl use. NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely. Reviewed by: juli, tjr
* Hold SOCK_LOCK(so) while frobbing so_options. Note that while therwatson2004-06-181-1/+4
| | | | | local race is corrected, there's still a global race in sosend() relating to so_options and the SO_DONTROUTE flag.
* Merge some additional leaf node socket buffer locking fromrwatson2004-06-181-13/+26
| | | | | | | | | | | | | | | | | | | | | rwatson_netperf: Introduce conditional locking of the socket buffer in fifofs kqueue filters; KNOTE() will be called holding the socket buffer locks in fifofs, but sometimes the kqueue() system call will poll using the same entry point without holding the socket buffer lock. Introduce conditional locking of the socket buffer in the socket kqueue filters; KNOTE() will be called holding the socket buffer locks in the socket code, but sometimes the kqueue() system call will poll using the same entry points without holding the socket buffer lock. Simplify the logic in sodisconnect() since we no longer need spls. NOTE: To remove conditional locking in the kqueue filters, it would make sense to use a separate kqueue API entry into the socket/fifo code when calling from the kqueue() system call.
* Merge additional socket buffer locking from rwatson_netperf:rwatson2004-06-171-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | | - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.
* The socket field so_state is used to hold a variety of socket relatedrwatson2004-06-141-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
* Extend coverage of SOCK_LOCK(so) to include so_count, the socketrwatson2004-06-121-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* Introduce a mutex into struct sockbuf, sb_mtx, which will be used torwatson2004-06-121-0/+4
| | | | | | | | | | | | | protect fields in the socket buffer. Add accessor macros to use the mutex (SOCKBUF_*()). Initialize the mutex in soalloc(), and destroy it in sodealloc(). Add addition, add SOCK_*() access macros which will protect most remaining fields in the socket; for the time being, use the receive socket buffer mutex to implement socket level locking to reduce memory overhead. Submitted by: sam Sponosored by: FreeBSD Foundation Obtained from: BSD/OS
* Avoid assignments to cast expressions.stefanf2004-06-081-2/+2
| | | | | Reviewed by: md5 Approved by: das (mentor)
* Integrate accept locking from rwatson_netperf, introducing a newrwatson2004-06-021-22/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | global mutex, accept_mtx, which serializes access to the following fields across all sockets: so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes. While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular: - Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize. - In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code. - In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one. - In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller. - Generally cluster accept queue related operations together throughout these functions in order to facilitate locking. Annotate new locking in socketvar.h.
* The SS_COMP and SS_INCOMP flags in the so_state field indicate whetherrwatson2004-06-011-4/+4
| | | | | | | | the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.
* Add MSG_NBIO flag option to soreceive() and sosend() that causestruckman2004-06-011-2/+3
| | | | | | | | | | | | them to behave the same as if the SS_NBIO socket flag had been set for this call. The SS_NBIO flag for ordinary sockets is set by fcntl(fd, F_SETFL, O_NONBLOCK). Pass the MSG_NBIO flag to the soreceive() and sosend() calls in fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag on the underlying socket for each I/O operation. The O_NONBLOCK flag is a property of the descriptor, and unlike ordinary sockets, fifos may be referenced by multiple descriptors.
* Bring in mbuma to replace mballoc.bmilekic2004-05-311-38/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein. Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt. mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework. From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime. Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps. This change removes more code than it adds. A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits. Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
* Compare pointers with NULL rather than using pointers are booleans inrwatson2004-04-091-51/+53
| | | | | if/for statements. Assign pointers to NULL rather than typecast 0. Compare pointers with NULL rather than 0.
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* In sofree(), avoid nested declaration and initialization inrwatson2004-03-311-1/+2
| | | | | | | | declaration. Observe that initialization in declaration is frequently incompatible with locking, not just a bad idea due to style(9). Submitted by: bde
* Use a common return path for filt_soread() and filt_sowrite() torwatson2004-03-291-16/+20
| | | | | | | simplify the impact of locking on these functions. Submitted by: sam Sponsored by: FreeBSD Foundation
* In sofree(), moving caching of 'head' from 'so->so_head' to later inrwatson2004-03-291-2/+2
| | | | | the function once it has been determined to be non-NULL to simplify locking on an earlier return.
* Rename dup_sockaddr() to sodupsockaddr() for consistency with otherrwatson2004-03-011-3/+3
| | | | | | | | | | | | functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam
* Convert the other use of flags to mflags in soalloc().scottl2004-03-011-1/+1
|
* Modify soalloc() API so that it accepts a malloc flags argument ratherrwatson2004-02-291-10/+3
| | | | | | | | | than a "waitok" argument. Callers now passing M_WAITOK or M_NOWAIT rather than 0 or 1. This simplifies the soalloc() logic, and also makes the waiting behavior of soalloc() more clear in the calling context. Submitted by: sam
* Always socantsendmore() before deallocating a socket. This, in turn,green2004-02-121-0/+7
| | | | | | | calls selwakeup() if necessary (which it is, if you don't want freed memory hanging around on your td->td_selq). Props to: alfred
* Introduce the SO_BINTIME option which takes a high-resolution timestampphk2004-01-311-0/+2
| | | | | | | | | | | | at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.
* Since "m" is not part of the "mp" chain, need to free() it.ru2004-01-181-0/+1
| | | | Reported by: Stanford Metacompilation research group
* Reduce gratuitous redundancy and length in function names:rwatson2003-11-161-7/+5
| | | | | | | | | mac_setsockopt_label_set() -> mac_setsockopt_label() mac_getsockopt_label_get() -> mac_getsockopt_label() mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel() Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, makerwatson2003-11-161-0/+8
| | | | | | | | | sure to sooptcopyin() the (struct mac) so that the MAC Framework knows which label types are being requested. This fixes process queries of socket labels. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - Implement selwakeuppri() which allows raising the priority of atanimura2003-11-091-1/+1
| | | | | | | | | | | | | thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
* speedup stream socket recv handling by tracking the tail ofsam2003-10-281-3/+52
| | | | | | | the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)
* Change all SYSCTLS which are readonly and have a related TUNABLEsilby2003-10-211-1/+2
| | | | | from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
* Make the second argument to sooptcopyout() constant in order tohsu2003-08-051-4/+1
| | | | | | simplify the upcoming PIM patches. Submitted by: Pavlin Radoslavov <pavlin@icir.org>
* To avoid a kernel panic provoked by a NULL pointer dereference,robert2003-07-171-1/+7
| | | | | | | | | | | | | | | | do not clear the `sb_sel' member of the sockbuf structure while invalidating the receive sockbuf in sorflush(), called from soshutdown(). The panic was reproduceable from user land by attaching a knote with EVFILT_READ filters to a socket, disabling further reads from it using shutdown(2), and then closing it. knote_remove() was called to remove all knotes from the socket file descriptor by detaching each using its associated filterops' detach call- back function, sordetach() in this case, which tried to remove itself from the invalidated sockbuf's klist (sb_sel.si_note). PR: kern/54331
* Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok.hsu2003-07-141-1/+1
| | | | Reported by: arr
* Use __FBSDID().obrien2003-06-111-1/+3
|
* Deprecate machine/limits.h in favor of new sys/limits.h.kan2003-04-291-1/+1
| | | | | | | Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
* Use while (*controlp != NULL) instead of do ... while (*control != NULL)cognet2003-04-141-2/+1
| | | | | | There are valid cases where *controlp will be NULL at this point. Discussed with: dwmalone
* Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.des2003-03-021-33/+33
|
* uiomove-related caddr_t -> void * (just the low-hanging fruit)des2003-03-021-5/+5
|
* Remove duplicate includes.cognet2003-02-201-1/+0
| | | | Submitted by: Cyril Nguyen-Huu <cyril@ci0.org>
* Back out M_* changes, per decision of the TRB.imp2003-02-191-16/+16
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-16/+16
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Disallow listen() on sockets which are in the SS_ISCONNECTED ortmm2003-01-171-0/+4
| | | | | | | | | | | SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates in this case). listen() on connected or connecting sockets would cause them to enter a bad state; in the TCP case, this could cause sockets to go catatonic or panics, depending on how the socket was connected. Reviewed by: -net MFC after: 2 weeks
* Bow to the whining masses and change a union back into void *. Retaindillon2003-01-131-6/+6
| | | | | removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
* Change struct file f_data to un_data, a union of the correct structdillon2003-01-121-6/+6
| | | | | | | | | | pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
* In sodealloc(), if there is an accept filter present on the socketalfred2003-01-051-9/+3
| | | | | | | then call do_setopt_accept_filter(so, NULL) which will free the filter instead of duplicating the code in do_setopt_accept_filter(). Pointed out by: Hiten Pandya <hiten@angelica.unixdaemons.com>
* s/sokqfilter/soo_kqfilter/ for consistency with the naming of allphk2002-12-231-1/+1
| | | | other socket/file operations.
* Small SO_RCVTIMEO and SO_SNDTIMEO values are mistakenly taken to be zero.maxim2002-11-271-0/+2
| | | | | | | PR: kern/32827 Submitted by: Hartmut Brandt <brandt@fokus.gmd.de> Approved by: re (jhb) MFC after: 2 weeks
* Fix instances of macros with improperly parenthasized arguments.alfred2002-11-091-1/+1
| | | | Verified by: md5
* Fix filt_soread() to properly flag a kevent when a 0-byte datagram iskbyanc2002-11-051-1/+1
| | | | | | | received. Verified by: dougb, Manfred Antar <null@pozo.com> Sponsored by: NTT Multimedia Communications Labs
* Revert the change in revision 1.77 of kern/uipc_socket2.c. It is causingalc2002-11-021-1/+1
| | | | | | a panic because the socket's state isn't as expected by sofree(). Discussed with: dillon, fenner
* Track the number of non-data chararacters stored in socket buffers so thatkbyanc2002-11-011-1/+1
| | | | | | | | | | | the data value returned by kevent()'s EVFILT_READ filter on non-TCP sockets accurately reflects the amount of data that can be read from the sockets by applications. PR: 30634 Reviewed by: -net, -arch Sponsored by: NTT Multimedia Communications Labs MFC after: 2 weeks
OpenPOWER on IntegriCloud