summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* Define INP_UNLOCK_ASSERT() to assert that an inpcb is unlocked.rwatson2004-12-051-0/+1
| | | | MFC after: 2 weeks
* Push the inpcb argument into ip_setmoptions() when setting IP multicastrwatson2004-12-051-10/+8
| | | | socket options, so that it is available for locking.
* Start working through inpcb locking for ip_ctloutput() by cleaning uprwatson2004-12-051-10/+13
| | | | | | | | | | | | modifications to the inpcb IP options mbuf: - Lock the inpcb before passing it into ip_pcbopts() in order to prevent simulatenous reads and read-modify-writes that could result in races. - Pass the inpcb reference into ip_pcbopts() instead of the option chain pointer in the inpcb. - Assert the inpcb lock in ip_pcbots. - Convert one or two uses of a pointer as a boolean or an integer comparison to a comparison with NULL for readability.
* Fixes a bug in SACK causing us to send data beyond the receive window.ps2004-11-291-2/+4
| | | | | Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
* Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify-rwatson2004-11-282-0/+4
| | | | write of various time/rtt-related fields in the tcpcb.
* Expand coverage of the receive socket buffer lock when handling urgentrwatson2004-11-282-4/+6
| | | | | | | | pointer updates: test available space while holding the socket buffer mutex, and continue to hold until until the pointer update has been performed. MFC after: 2 weeks
* Do export the advertised receive window via the tcpi_rcv_space field ofrwatson2004-11-272-1/+2
| | | | struct tcp_info.
* Implement parts of the TCP_INFO socket option as found in Linux 2.6.rwatson2004-11-262-2/+120
| | | | | | | | | | | | | | | This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.
* Fix a problem where our TCP stack would ignore RST packets if the receivesilby2004-11-252-4/+6
| | | | | | | | | | | window was 0 bytes in size. This may have been the cause of unsolved "connection not closing" reports over the years. Thanks to Michiel Boland for providing the fix and providing a concise test program for the problem. Submitted by: Michiel Boland MFC after: 2 weeks
* In tcp_reass(), assert the inpcb lock on the passed tcpcb, since therwatson2004-11-232-24/+38
| | | | | | | | | | | | | contents of the tcpcb are read and modified in volume. In tcp_input(), replace th comparison with 0 with a comparison with NULL. At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in tcp_input(), assert 'headlocked'. Try to improve consistency between various assertions regarding headlocked to be more informative. MFC after: 2 weeks
* tcp_timewait() performs multiple non-atomic reads on the tcptwrwatson2004-11-235-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks
* De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possiblerwatson2004-11-231-15/+11
| | | | | | | | | | | | | | | but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks
* Assert the inpcb lock in tcp_twstart(), which does both read-modify-writerwatson2004-11-232-0/+20
| | | | | | | | | | | | | | | | | | | on the tcpcb, but also calls into tcp_close() and tcp_twrespond(). Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug. Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock. Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock. MFC after: 2 weeks
* Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(),rwatson2004-11-232-0/+8
| | | | | | and tcp_drop(), due to read-modify-write of TCP state variables. MFC after: 2 weeks
* Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lockrwatson2004-11-232-8/+22
| | | | | | | | | | | | protects access to the ISN state variables. Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping. Staticize internal ISN variables since they're not used outside of tcp_subr.c. MFC after: 2 weeks
* Remove "Unlocked read" annotations associated with previously unlockedrwatson2004-11-222-6/+0
| | | | | | | use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock. MFC after: 1 week
* s/send/sent/ in comment describing TCPS_SYN_RECEIVED.rwatson2004-11-211-1/+1
|
* - Since divert protocol is not connection oriented, remove SS_ISCONNECTED flagglebius2004-11-181-33/+0
| | | | | | | | | | | | | from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now. Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month
* Fix host route addition for more than one address to a loopback interfacemlaier2004-11-171-1/+1
| | | | | | | | after allowing more than one address with the same prefix. Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru> Submitted by: ru (also NetBSD rev. 1.83) Pointyhat to: mlaier
* Merge copyright notices.mlaier2004-11-131-28/+1
| | | | Requested by: njl
* Fix ng_ksocket(4) operation as a divert socket, which is pretty usefulglebius2004-11-121-11/+12
| | | | | | | | | | | | | | | | | | | | and has been broken twice: - in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2] A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly. Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week
* Change the way we automatically add prefix routes when adding a new address.mlaier2004-11-121-27/+147
| | | | | | | | | | | | | | | | This makes it possible to have more than one address with the same prefix. The first address added is used for the route. On deletion of an address with IFA_ROUTE set, we try to find a "fallback" address and hand over the route if possible. I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to in_ifscrub as it must be considered KAPI as it is not static in in.c. I will clean this after the MFC. Discussed on: arch, net Tested by: many testers of the CARP patches Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it> Obtained from: WIDE via OpenBSD MFC after: 1 month
* Add missing '='phk2004-11-111-1/+1
| | | | Spotted by: obrien
* Fix a double-free in the 'hlen > m->m_len' sanity check.andre2004-11-091-1/+1
| | | | | Bug report by: <james@towardex.com> MFC after: 2 weeks
* support TCP-MD5(IPv4) in KAME-IPSEC, too.suz2004-11-082-0/+2
| | | | MFC after: 3 week
* Initialize struct pr_userreqs in new/sparse style and fill in commonphk2004-11-084-26/+67
| | | | | | default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
* Do some re-sorting of TCP pcbinfo locking and assertions: make sure torwatson2004-11-072-12/+10
| | | | | | | | | | | retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held. MFC after: 3 weeks
* Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check.andre2004-11-061-2/+2
| | | | | Bug report by: <james@towardex.com> MFC after: 2 weeks
* Hide udp_in6 behind #ifdef INET6phk2004-11-041-0/+2
|
* When performing IP fast forwarding, immediately drop traffic which isbms2004-11-041-0/+6
| | | | | | | | | | | destined for a blackhole route. This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case. Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks
* Until this change, the UDP input code used global variables udp_in,rwatson2004-11-041-57/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks
* Remove RFC1644 T/TCP support from the TCP side of the network stack.andre2004-11-0213-841/+34
| | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
* Correct a bug in TCP SACK that could result in wedging of the TCP stackrwatson2004-10-301-2/+2
| | | | | | | | | | | under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>
* Add a matching tunable for net.inet.tcp.sack.enable sysctl.rwatson2004-10-261-0/+1
|
* Check that rt_mask(rt) is non-NULL before dereferencing it, in thebms2004-10-261-0/+1
| | | | | | RTM_ADD case, thus avoiding a panic. Submitted by: Iasen Kostov
* IPDIVERT is a module now and tell the other parts of the kernel about it.andre2004-10-251-0/+4
| | | | IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.
* For variables that are only checked with defined(), don't provideru2004-10-241-1/+1
| | | | any fake value.
* Shave 40 unused bytes from struct tcpcb.andre2004-10-221-1/+0
|
* When printing the initialization string and IPDIVERT is not compiled into theandre2004-10-221-1/+1
| | | | kernel refer to it as "loadable" instead of "disabled".
* Refuse to unload the ipdivert module unless the 'force' flag is given to ↵andre2004-10-221-1/+11
| | | | | | | kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.
* Destroy the UMA zone on unload.andre2004-10-191-0/+1
|
* Slightly extend the locking during unload to fully cover the protocolandre2004-10-191-5/+6
| | | | | deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.
* Annotate a newly introduced race present due to the unloading ofrwatson2004-10-191-0/+4
| | | | | | | | protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.
* Convert IPDIVERT into a loadable module. This makes use of the dynamic ↵andre2004-10-195-37/+92
| | | | | | | | | | | loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.
* Properly declare the "net.inet" sysctl subtree.andre2004-10-191-0/+1
|
* Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACERandre2004-10-191-0/+6
| | | | | to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.
* Make use of the PROTO_SPACER functionality for dynamically loadableandre2004-10-191-2/+19
| | | | | | | protocols in inetsw[] and define initially eight spacer slots. Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.
* Support for dynamically loadable and unloadable IP protocols in the ipmux.andre2004-10-192-1/+64
| | | | | | | | | | | | | | | With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets. The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.
* Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.andre2004-10-191-0/+1
|
* Make comments more clear. Change the order of one if() statement to check theandre2004-10-191-3/+8
| | | | more likely variable first.
OpenPOWER on IntegriCloud