summaryrefslogtreecommitdiffstats
path: root/sys/kern/uipc_socket.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Properly set size of the file_zone to match kern.maxfiles parameter.sobomax2008-03-161-0/+1
| | | | | | | | Otherwise the parameter is no-op, since zone by default limits number of descriptors to some 12K entries. Attempt to allocate more ends up sleeping on zonelimit. MFC after: 2 weeks
* Further clean up sorflush:rwatson2008-02-041-12/+12
| | | | | | | | | | | | | | - Expose sbrelease_internal(), a variant of sbrelease() with no expectations about the validity of locks in the socket buffer. - Use sbrelease_internel() in sorflush(), and as a result avoid intializing and destroying a socket buffer lock for the temporary stack copy of the actual buffer, asb. - Add a comment indicating why we do what we do, and remove an XXX since things have gotten less ugly in sorflush() lately. This makes socket close cleaner, and possibly also marginally faster. MFC after: 3 weeks
* Correct two problems relating to sorflush(), which is called to flushrwatson2008-01-311-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Despite several examples in the kernel, the third argument ofdwmalone2007-06-041-2/+2
| | | | | | | | | | | | | sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int. Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples. In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-011-3/+3
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Generally migrate to ANSI function headers, and remove 'register' use.rwatson2007-05-161-116/+61
|
* Add missing socket buffer unlock before returning to userland.yongari2007-05-081-1/+1
| | | | Reviewed by: rwatson
* sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flagsrwatson2007-05-031-61/+68
| | | | | | | | | | | | | | | | | | | | | | | on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.
* Following movement of functions from uipc_socket2.c to uipc_socket.c andrwatson2007-03-261-34/+32
| | | | uipc_sockbuf.c, clean up and update comments.
* Complete removal of uipc_socket2.c by moving the last few functions torwatson2007-03-261-0/+298
| | | | | | | | | | | | | other C files: - Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While sbcreatecontrol() is really an mbuf allocation routine, it does its work with awareness of the layout of socket buffer memory. - Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub versions of several of these functions live. Likewise, move socket state transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo sodupsockaddr() and sotoxsocket().
* Move the dom_dispose and pru_detach calls in sofree() earlier. Only afterglebius2007-03-221-4/+5
| | | | | | | | | calling pru_detach we can be absolutely sure, that we don't have any references to the socket in the stack. This closes race between lockless sbdestroy() and data arriving on socket. Reviewed by: rwatson
* - Use m_gethdr(), m_get(), and m_clget() instead of the macros injhb2007-03-121-20/+5
| | | | | | | | | | sosend_copyin(). - Use M_WAITOK instead of M_TRYWAIT in sosend_copyin(). - Don't check for NULL from M_WAITOK and return ENOBUFS. M_WAITOK/M_TRYWAIT allocations don't fail with NULL. Reviewed by: andre Requested by: andre (2)
* Don't block on the socket zone limit during the socket()ru2007-02-261-5/+5
| | | | | | | | | call which can easily lock up a system otherwise; instead, return ENOBUFS as documented in a manpage, thus reverting us to the FreeBSD 4.x behavior. Reviewed by: rwatson MFC after: 2 weeks
* Rename somaxconn_sysctl() to sysctl_somaxconn() so that I will be able torwatson2007-02-151-3/+3
| | | | claim that sofoo() functions all accept a socket as their first argument.
* Diff reduction with RELENG_6, style(9):bms2007-02-031-3/+2
| | | | | Remove unnecessary brace; && should be on end of line. No functional changes.
* Generic socket buffer auto sizing support, header defines, flag inheritance.andre2007-02-011-0/+8
| | | | MFC after: 1 month
* Unbreak writes of 0 bytes. Zero byte writes happen when only ancillaryandre2007-01-221-0/+10
| | | | | | | | | | | control data but no payload data is passed. Change m_uiotombuf() to return at least one empty mbuf if the requested length was zero. Add comment to sosend_dgram and sosend_generic(). Diagnoses by: jhb Regression test by: rwatson Pointy hat to. andre
* Canonicalize copyrights in some files I hold copyrights on:rwatson2007-01-081-1/+2
| | | | | | | | - Sort by date in license blocks, oldest copyright first. - All rights reserved after all copyrights, not just the first. - Use (c) to be consistent with other entries. MFC after: 3 days
* Drop all received data mbufs from a socket's queue if the MT_SONAMEbms2006-12-231-11/+9
| | | | | | | | | | | | | mbuf is dropped, to preserve the invariant in the PR_ADDR case. Add a regression test to detect this condition, but do not hook it up to the build for now. PR: kern/38495 Submitted by: James Juran Reviewed by: sam, rwatson Obtained from: NetBSD MFC after: 2 weeks
* Fix a race in soclose() where connections could be queued to themohans2006-11-221-22/+26
| | | | | | | | listening socket after the pass that cleans those queues. This results in these connections being orphaned (and leaked). The fix is to clean up the so queues after detaching the socket from the protocol. Thanks to ups and jhb for discussions and a thorough code review.
* Use the improved m_uiotombuf() function instead of home grown sosend_copyin()andre2006-11-021-1/+29
| | | | | | | | | | | | | | | | | | to do the userland to kernel copying in sosend_generic() and sosend_dgram(). sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported by m_uiotombuf(). Benchmaring shows significant improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-0/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Fix a case where socket I/O atomicity is violated due to not droppingbms2006-09-221-1/+16
| | | | | | | | | the entire record when a non-data mbuf is removed in the soreceive() path. This only triggers a panic directly when compiled with INVARIANTS. PR: 38495 Submitted by: James Juran MFC after: 1 week
* Fix a lock leak in an error case.pjd2006-09-131-1/+1
| | | | | Reported by: netchild Reviewed by: rwatson
* New sockets created by incoming connections into listen sockets shouldandre2006-09-101-1/+4
| | | | | | | | | | | | inherit all settings and options except listen specific options. Add the missing send/receive timeouts and low watermarks. Remove inheritance of the field so_timeo which is unused. Noticed by: phk Reviewed by: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
* Fix a kernel panic based on receiving an ICMPv6 Packet too Big message.gnn2006-08-181-2/+2
| | | | | | | PR: 99779 Submitted by: Jinmei Tatuya Reviewed by: clement, rwatson MFC after: 1 week
* Before performing a sodealloc() when pru_attach() fails, assert thatrwatson2006-08-111-0/+3
| | | | | | | | the socket refcount remains 1, and then drop to 0 before freeing the socket. PR: 101763 Reported by: Gleb Kozyrev <gkozyrev at ukr dot net>
* Move destroying kqueue state from above pru_detach to below it inrwatson2006-08-021-2/+2
| | | | | | sofree(), as a number of protocols expect to be able to call soisdisconnected() during detach. That may not be a good assumption, but until I'm sure if it's a good assumption or not, allow it.
* Move updated of 'numopensockets' from bottom of sodealloc() to the top,rwatson2006-08-021-3/+1
| | | | | | eliminating a second set of identical mutex operations at the bottom. This allows brief exceeding of the max sockets limit, but only by sockets in the last stages of being torn down.
* Reimplement socket buffer tear-down in sofree(): as the socket is norwatson2006-08-011-14/+22
| | | | | | | | | | | | | longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer. This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.
* soreceive_generic(), and sopoll_generic(). Add new functions sosend(),rwatson2006-07-241-2/+52
| | | | | | | | | | | | | | | | soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
* Update various uipc_socket.c comments, and reformat others.rwatson2006-07-231-136/+150
|
* Change semantics of socket close and detach. Add a new protocol switchrwatson2006-07-211-6/+5
| | | | | | | | | | | | | | | | | | | function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn
* Change comment on soabort() to more accurately describe how/whenrwatson2006-07-161-12/+12
| | | | soabort() is used. Remove trailing white space.
* Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel)rwatson2006-07-111-2/+4
| | | | | | return void, so don't implement no-op versions of these functions. Instead, consistently check if those switch pointers are NULL before invoking them.
* When pru_attach() fails, call sodealloc() on the socket rather thanrwatson2006-07-111-4/+1
| | | | | | using sorele() and the full tear-down path. Since protocol state allocation failed, this is not required (and is arguably undesirable). This matches the behavior of sonewconn() under the same circumstances.
* When retrieving SO_ERROR via getsockopt(), hold the socket lock aroundrwatson2006-06-181-0/+2
| | | | | | the retrieval and replacement with 0. MFC after: 1 week
* Move some functions and definitions from uipc_socket2.c to uipc_socket.c:rwatson2006-06-101-36/+171
| | | | | | | | | | | | | | | | | | | | | | | | - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month
* Rearrange code in soalloc() so that it's less indented by returningrwatson2006-06-081-13/+13
| | | | | | | early if uma_zalloc() from the socket zone fails. No functional change. MFC after: 1 week
* Assert that sockets passed into soabort() not be SQ_COMP or SQ_INCOMP,rwatson2006-04-231-1/+3
| | | | | | since that removal should have been done a layer up. MFC after: 3 months
* Add missing 'not' to SQ_COMP comment.rwatson2006-04-231-1/+1
| | | | MFC after: 3 months
* Move handling of SQ_COMP exception case in sofree() to the top of therwatson2006-04-231-17/+5
| | | | | | | | | function along with the remainder of the reference checking code. Move comment from body to header with remainder of comments. Inclusion of a socket in a completed connection queue counts as a true reference, and should not be handled as an under-documented edge case. MFC after: 3 months
* Chance protocol switch method pru_detach() so that it returns voidrwatson2006-04-011-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months
* Change protocol switch pru_abort() API so that it returns void ratherrwatson2006-04-011-11/+29
| | | | | | | | | | | | | | than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months
* Assert so->so_pcb is NULL in sodealloc() -- the protocol state should notrwatson2006-04-011-0/+2
| | | | | | | | be present at this point. We will eventually remove this assert because the socket layer should never look at so_pcb, but for now it's a useful debugging tool. MFC after: 3 months
* Add a somewhat sizable comment documenting the semantics of various kernelrwatson2006-04-011-0/+57
| | | | | | | | socket calls relating to the creation and destruction of sockets. This will eventually form the foundation of socket(9), but is currently in too much flux to do so. MFC after: 3 months
* Change soabort() from returning int to returning void, since allrwatson2006-03-161-5/+3
| | | | | | consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.
* As with socket consumer references (so_count), make sofree() returnrwatson2006-03-151-3/+3
| | | | | without GC'ing the socket if a strong protocol reference to the socket is present (SS_PROTOREF).
* Improve consistency of return() style.rwatson2006-02-121-8/+8
| | | | MFC after: 3 days
OpenPOWER on IntegriCloud