summaryrefslogtreecommitdiffstats
path: root/sys/netinet/udp_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* In in_pcbconnect_setup() jailed sockets are treated specially: if localglebius2005-02-221-0/+5
| | | | | | | | | | | | | | | address is not supplied, then jail IP is choosed and in_pcbbind() is called. Since udp_output() does not save local addr after call to in_pcbconnect_setup(), in_pcbbind() is called for each packet, and this is incorrect. So, we shall treat jailed sockets specially in udp_output(), we will save their local address. This fixes a long standing bug with broken sendto() system call in jails. PR: kern/26506 Reviewed by: rwatson MFC after: 2 weeks
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Initialize struct pr_userreqs in new/sparse style and fill in commonphk2004-11-081-5/+12
| | | | | | default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
* Hide udp_in6 behind #ifdef INET6phk2004-11-041-0/+2
|
* Until this change, the UDP input code used global variables udp_in,rwatson2004-11-041-57/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks
* Don't release the udbinfo lock until after the last use of UDP inpcbrwatson2004-10-121-3/+3
| | | | | | | | in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.
* fix up socket/ip layer violation... don't assume/know thatjmg2004-09-051-1/+5
| | | | SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
* When sliding the m_data pointer forward, update m_pktrhdr.len as wellrwatson2004-08-221-1/+3
| | | | | | | | | as m_len, or the pkthdr length will be inconsistent with the actual length of data in the mbuf chain. The symptom of this occuring was "out of data" warnings from in_cksum_skip() on large UDP packets sent via the loopback interface. Foot shot: green
* When prepending space onto outgoing UDP datagram payloads to hold therwatson2004-08-211-4/+7
| | | | | | | | | UDP/IP header, make sure that space is also allocated for the link layer header. If an mbuf must be allocated to hold the UDP/IP header (very likely), then this will avoid an additional mbuf allocation at the link layer. This trick is also used by TCP and other protocols to avoid extra calls to the mbuf allocator in the ethernet (and related) output routines.
* Push down pcbinfo and inpcb locking from udp_send() into udp_output().rwatson2004-08-191-25/+35
| | | | | | | This provides greater context for the locking and allows us to avoid locking the pcbinfo structure if not binding operations will take place (i.e., already bound, connected, and no expliti sendto() address).
* White space cleanup for netinet before branch:rwatson2004-08-161-12/+12
| | | | | | | | | | | - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
* When udp_send() fails, make sure to free the control mbufs as well asrwatson2004-08-121-0/+2
| | | | | the data mbuf. This was done in most error cases, but not the case where the inpcb pointer is surprisingly NULL.
* Backout removal of UMA_ZONE_NOFREE flag for all zones which are establishedandre2004-08-111-1/+1
| | | | | | | | | for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging. Discussed with: bosko, tegge, rwatson
* Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP andandre2004-08-111-1/+1
| | | | | TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.
* When iterating the UDP inpcb list processing an inbound broadcastrwatson2004-08-061-10/+9
| | | | | | | | | | | or multicast packet, we don't need to acquire the inpcb mutex unless we are actually using inpcb fields other than the bound port and address. Since we hold the pcbinfo lock already, these can't change. Defer acquiring the inpcb mutex until we have a high chance of a match. This avoids about 120 mutex operations per UDP broadcast packet received on one of my work systems. Reviewed by: sam
* Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This iscperciva2004-07-261-1/+1
| | | | | | | | | | | somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags. The old name is still defined, but will be removed in a few days (unless I hear any complaints...) Discussed with: rwatson, scottl Requested by: jhb
* Reduce the number of unnecessary unlock-relocks on socket buffer mutexesrwatson2004-06-261-2/+7
| | | | | | | | | | | | | | | | | | | | associated with performing a wakeup on the socket buffer: - When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append(). - When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets. For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.
* Reverse a patch which has no effect on -CURRENT and should probably bebms2004-06-161-7/+1
| | | | | | | applied directly to -STABLE. Noticed by: iedowse Pointy hat to: bms
* Disconnect a temporarily-connected UDP socket in out-of-mbufs case. Thisbms2004-06-161-1/+7
| | | | | | | | | | | | | | | | | | | | | | fixes the problem of UDP sockets getting wedged in a connected state (and bound to their destination) under heavy load. Temporary bind/connect should probably be deleted in future as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993]. Notes: - INP_LOCK() is already held in udp_output(). The connection is in effect happening at a layer lower than the socket layer, therefore in theory socket locking should not be needed. - Inlining the in_pcbdisconnect() operation buys us nothing (in the case of the current state of the code), as laddr is not part of the inpcb hash or the udbinfo hash. Therefore there should be no need to rehash after restoring laddr in the error case (this was a concern of the original author of the patch). PR: kern/41765 Requested by: gnn Submitted by: Jinmei Tatuya (with cleanups) Tested by: spray(8)
* Switch to using the inpcb MAC label instead of socket MAC label whenrwatson2004-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* Assert inpcb lock in udp_append().rwatson2004-05-041-0/+2
| | | | | Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:pjd2004-03-271-4/+4
| | | | | | | | | | | | | | - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume
* Remove unused argument.pjd2004-03-271-1/+1
| | | | Reviewed by: ume
* Split the mlock() kernel code into two parts, mlock(), which unpackstruckman2004-02-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way. Enable the RLIMIT_MEMLOCK checking code in kern_mlock(). Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits. Nuke the vslock() and vsunlock() implementations, which are no longer used. Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request. Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request. Modify the callers of sysctl_wire_old_buffer() to look for the error return. Modify sysctl_old_user to obey the wired buffer length and clean up its implementation. Reviewed by: bms
* IPSEC and FAST_IPSEC have the same internal API now;ume2004-02-171-8/+3
| | | | | | so merge these (IPSEC has an extra ipsecstat) Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
* pass pcb rather than so. it is expected that per socket policyume2004-02-031-1/+1
| | | | works again.
* Introduce the SO_BINTIME option which takes a high-resolution timestampphk2004-01-311-1/+1
| | | | | | | | | | | | at packet arrival. For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps. This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.
* Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls.ru2004-01-271-1/+1
|
* Split the "inp" mutex class into separate classes for each of divert,sam2003-11-261-1/+1
| | | | | | | | raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-5/+12
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path.bms2003-11-121-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | This switch toggles between strict multicast delivery, and traditional multicast delivery. The traditional (default) behaviour is to deliver multicast datagrams to all sockets which are members of that group, regardless of the network interface where the datagrams were received. The strict behaviour is to deliver multicast datagrams received on a particular interface only to sockets whose membership is bound to that interface. Note that as a matter of course, multicast consumers specifying INADDR_ANY for their interface get joined on the interface where the default route happens to be bound. This switch has no effect if the interface which the consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING. The original patch has been cleaned up somewhat from that submitted. It has been tested on a multihomed machine with multiple QuickTime RTP streams running over the local switch, which doesn't do IGMP snooping. PR: kern/58359 Submitted by: William A. Carrel Reviewed by: rwatson MFC after: 1 week
* assert inpcb is locked in udp_outputsam2003-11-081-0/+1
| | | | Supported by: FreeBSD Foundation
* ip6_savecontrol() argument is redundantume2003-10-291-1/+1
|
* PR: kern/56343bms2003-09-031-1/+3
| | | | | Reviewed by: tjr Approved by: jake (mentor)
* Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent onbms2003-08-201-2/+6
| | | | | | | | | | specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp. Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)
* add missing unlock when in_pcballoc returns an errorsam2003-08-191-1/+3
|
* Back out M_* changes, per decision of the TRB.imp2003-02-191-2/+2
| | | | Approved by: trb
* Take advantage of pre-existing lock-free synchronization and type stable memoryhsu2003-02-151-3/+4
| | | | to avoid acquiring SMP locks during expensive copyout process.
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-2/+2
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Back out some style changes. They are not urgent,luigi2002-11-201-26/+37
| | | | | | | I will put them back in after 5.0 is out. Requested by: sam Approved by: re
* Minor documentation changes and indentation fix.luigi2002-11-171-37/+26
| | | | | | | Replace m_copy() with m_copypacket() where applicable. While at it, fix some function headers and remove 'register' from variable declarations.
* Implement a new IP_SENDSRCADDR ancillary message type that permitsiedowse2002-10-211-4/+62
| | | | | | | | | | | | a server process bound to a wildcard UDP socket to select the IP address from which outgoing packets are sent on a per-datagram basis. When combined with IP_RECVDSTADDR, such a server process can guarantee to reply to an incoming request using the same source IP address as the destination IP address of the request, without having to open one socket per server IP address. Discussed on: -net Approved by: re
* Remove the "temporary connection" hack in udp_output(). In orderiedowse2002-10-211-23/+26
| | | | | | | | | | | | | | | | | to send datagrams from an unconnected socket, we used to first block input, then connect the socket to the sendmsg/sendto destination, send the datagram, and finally disconnect the socket and unblock input. We now use in_pcbconnect_setup() to check if a connect() would have succeeded, but we never record the connection in the PCB (local anonymous port allocation is still recorded, though). The result from in_pcbconnect_setup() authorises the sending of the datagram and selects the local address and port to use, so we just construct the header and call ip_output(). Discussed on: -net Approved by: re
* correct PCB locking in broadcast/multicast case that was exposed by changesam2002-10-161-1/+1
| | | | | | to use udp_append Reviewed by: hsu
* Tie new "Fast IPsec" code into the build. This involves the usualsam2002-10-161-86/+39
| | | | | | | | | | | | configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
* Replace aux mbufs with packet tags:sam2002-10-161-7/+1
| | | | | | | | | | | | | | | | | | | o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
* Code formatting sync to trustedbsd_mac: don't perform an assignmentrwatson2002-08-151-2/+2
| | | | | | | | | | | in an if clause. PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:
* Rename mac_check_socket_receive() to mac_check_socket_deliver() so thatrwatson2002-08-151-2/+2
| | | | | | | | we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
OpenPOWER on IntegriCloud