summaryrefslogtreecommitdiffstats
path: root/sys/netinet/udp_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* Add soreceive_dgram(9), an optimized socket receive function for use byrwatson2008-07-021-0/+1
| | | | | | | | | | | | | | datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers *significant* performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps
* In udp_append() and udp_input(), make use of read locking on incpbsrwatson2008-06-301-8/+8
| | | | | | | | | | | | | | | rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking. While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc. MFC after: 2 months Tested by: gnn, kris, ps
* Employ read locks on UDP inpcbs, rather than write locks, whenrwatson2008-05-291-13/+18
| | | | | | | | monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week
* Factor out the v4-only vs. the v6-only inp_flags processing inbz2008-05-241-8/+3
| | | | | | | | | ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days
* Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros torwatson2008-04-171-34/+36
| | | | | | | | | | | | | | | explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Add FBSDID to all files in netinet so that people can moresilby2007-10-071-1/+3
| | | | | | easily include file version information in bug reports. Approved by: re (kensmith)
* Further UDPv4 cleanup:rwatson2007-09-101-17/+17
| | | | | | | | | | | | - Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist. Approved by: re (kensmith)
* Further cleanup of UDPv4:rwatson2007-07-101-98/+95
| | | | | | | | | | | | | | - Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present. - Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable. - Move udp_append() above udp_input() to match the function order in udp6_usrreq.c. Approved by: re (kensmith)
* Minor UDPv4 cleanup: capitalize comment, move statistics update after mbufrwatson2007-07-071-3/+3
| | | | | | | free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after. Approved by: re (kensmith)
* Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSECgnn2007-07-031-3/+3
| | | | | | | | option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing
* Commit IPv6 support for FAST_IPSEC to the tree.gnn2007-07-011-9/+3
| | | | | | | | | This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing
* Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0)mjacob2007-06-171-0/+2
| | | | codepath.
* Import rewrite of IPv4 socket multicast layer to support source-specificbms2007-06-121-50/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work. This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project. The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING. This work was financially supported by another FreeBSD committer. Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)
* Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); inrwatson2007-06-121-2/+1
| | | | | | | | | | | | | | | some cases, move to priv_check() if it was an operation on a thread and no other flags were present. Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c. We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h. Reviewed by: csjp Obtained from: TrustedBSD Project
* When verifying the IPv4 UDP checksum, don't overwrite the checksumdwmalone2007-05-161-5/+7
| | | | | | | | | | value in the mbuf with the result of the calculation. Previously, if we chose to return an ICMP message, the quoted UDP checksum bytes would be different to what was sent. PR: 112471 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 3 weeks
* Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddrrwatson2007-05-111-2/+2
| | | | | | | | protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.
* Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr()rwatson2007-05-071-25/+2
| | | | | | and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().
* Minor style tweaks.rwatson2007-05-071-17/+22
|
* Remove unused pcbinfo arguments to in_setsockaddr() andrwatson2007-05-011-2/+2
| | | | in_setpeeraddr().
* Rename some fields of struct inpcbinfo to have the ipi_ prefix,rwatson2007-04-301-5/+6
| | | | | | consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.
* Fix IP_SENDSRCADDR semantics.bms2007-03-081-4/+11
| | | | | | | | | | | | | | | | * To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time. * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous. * If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup(). Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days
* Rename two identically named log_in_vain variables: tcp_input.c's staticrwatson2007-02-201-3/+3
| | | | | | | log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week
* Gratuitous UDP restyling toward style(9) in 7.x.rwatson2007-02-201-142/+135
|
* o One more typo in the comment.maxim2007-01-061-1/+1
| | | | | PR: kern/107609 Submitted by: Dr. Markus Waldeck
* Fix typo in comment.imp2007-01-011-1/+1
| | | | Submitted by: remko
* Add comment about udp checksums being off in BSD 4.2 compatibility mode.imp2006-12-311-1/+8
| | | | | Submitted by: Dr. Markus Waldeck PR: kern/106657
* Some whitespace nits and remove a few casts.jhb2006-12-291-1/+2
|
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-1/+3
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Check inp_flags instead of inp_vflag for INP_ONESBCAST flag.andre2006-09-061-2/+2
| | | | | | | PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
* Fix typo in comment.thomas2006-09-041-1/+1
|
* Change semantics of socket close and detach. Add a new protocol switchrwatson2006-07-211-4/+28
| | | | | | | | | | | | | | | | | | | function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference. This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true. Reviewed by: gnn
* Fix race conditions on enumerating pcb lists by moving the initializationups2006-07-181-4/+14
| | | | | | | | | | | | | | | ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked) I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs) Reviewed by: rwatson@, mohans@ MFC after: 1 week
* Acquire udbinfo lock after call to soreserve() rather than before, as itrwatson2006-06-031-4/+2
| | | | | | | is not required. This simplifies error-handling, and reduces the time that this lock is held. MFC after: 1 month
* o In udp|rip_disconnect() acquire a socket lock before the socketmaxim2006-05-211-1/+3
| | | | | | | state modification. To prevent races do that while holding inpcb lock. Reviewed by: rwatson
* Modify UDP to use sosend_dgram() instead of sosend(). This allowsrwatson2006-05-061-0/+1
| | | | | | | | | | | for signicantly optimized UDP socket I/O when using a single UDP socket from many threads or processes that share it, by avoiding significant locking and other overhead in the general sosend() path that isn't necessary for simple datagram sockets. Specifically, this change results in a significant performance improvement for threaded name service in BIND9 under load. Suggested by: Jinmei_Tatsuya at isc dot org
* Rename 'last' to 'inp' in udp_append(): the name 'last' is due torwatson2006-04-251-15/+15
| | | | | | | | the fact that the loop through inpcb's in udp_input() tracks the last inpcb while looping. We keep that name in the calling loop but not in the delivery routine itself. MFC after: 3 months
* Allow for nmbclusters and maxsockets to be increased via sysctl.ps2006-04-211-0/+10
| | | | | An eventhandler is used to update all the various zones that depend on these values.
* Update in_pcb-derived basic socket types following changes torwatson2006-04-011-35/+16
| | | | | | | | | | | | | | | | | | | | | pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months
* Chance protocol switch method pru_detach() so that it returns voidrwatson2006-04-011-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months
* Change protocol switch pru_abort() API so that it returns void ratherrwatson2006-04-011-3/+2
| | | | | | | | | | | | | | than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months
* Implement 'ipfw fwd laddr,port' feature for UDP. According to ipfw(8)glebius2006-01-241-0/+20
| | | | | | it should work, however it never did. People expect it to work. PR: kern/90834
* Remove dead code: 'opts' is not used in udp_append(), only in udp_input(),rwatson2006-01-141-3/+0
| | | | | | | so no need to assign it to NULL or conditionally free it. Found with: Coverity Prevent(tm) MFC after: 3 days
* Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() tomux2005-12-141-2/+2
| | | | | | | match the type of the variable they are exporting. Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days
* Consolidate all IP Options handling functions into ip_options.[ch] andandre2005-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | include ip_options.h into all files making use of IP Options functions. From ip_input.c rev 1.306: ip_dooptions(struct mbuf *m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt) From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m) No functional changes in this commit. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
* o INP_ONESBCAST is inpcb.inp_vflag flag not inp_flags. The confusionmaxim2005-10-121-2/+2
| | | | | | | | | with IP_PORTRANGE_HIGH leads to the incorrect checksum calculation. PR: kern/87306 Submitted by: Rickard Lind Reviewed by: bms MFC after: 2 weeks
* Implement IP_DONTFRAG IP socket option enabling the Don't Fragmentandre2005-09-261-0/+9
| | | | | | | | | | | | flag on IP packets. Currently this option is only repected on udp and raw ip sockets. On tcp sockets the DF flag is controlled by the path MTU discovery option. Sending a packet larger than the MTU size of the egress interface returns an EMSGSIZE error. Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
* Add socketoption IP_MINTTL. May be used to set the minimum acceptableandre2005-08-221-0/+3
| | | | | | | | | | | | | | | | | TTL a packet must have when received on a socket. All packets with a lower TTL are silently dropped. Works on already connected/connecting and listening sockets for RAW/UDP/TCP. This option is only really useful when set to 255 preventing packets from outside the directly connected networks reaching local listeners on sockets. Allows userland implementation of 'The Generalized TTL Security Mechanism (GTSM)' according to RFC3682. Examples of such use include the Cisco IOS BGP implementation command "neighbor ttl-security". MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
* De-spl UDP.rwatson2005-06-011-31/+5
| | | | MFC after: 3 days
OpenPOWER on IntegriCloud