summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* Fixes a bug in SACK causing us to send data beyond the receive window.ps2004-11-291-2/+4
| | | | | Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
* Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify-rwatson2004-11-282-0/+4
| | | | write of various time/rtt-related fields in the tcpcb.
* Expand coverage of the receive socket buffer lock when handling urgentrwatson2004-11-282-4/+6
| | | | | | | | pointer updates: test available space while holding the socket buffer mutex, and continue to hold until until the pointer update has been performed. MFC after: 2 weeks
* Do export the advertised receive window via the tcpi_rcv_space field ofrwatson2004-11-272-1/+2
| | | | struct tcp_info.
* Implement parts of the TCP_INFO socket option as found in Linux 2.6.rwatson2004-11-262-2/+120
| | | | | | | | | | | | | | | This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.
* Fix a problem where our TCP stack would ignore RST packets if the receivesilby2004-11-252-4/+6
| | | | | | | | | | | window was 0 bytes in size. This may have been the cause of unsolved "connection not closing" reports over the years. Thanks to Michiel Boland for providing the fix and providing a concise test program for the problem. Submitted by: Michiel Boland MFC after: 2 weeks
* In tcp_reass(), assert the inpcb lock on the passed tcpcb, since therwatson2004-11-232-24/+38
| | | | | | | | | | | | | contents of the tcpcb are read and modified in volume. In tcp_input(), replace th comparison with 0 with a comparison with NULL. At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in tcp_input(), assert 'headlocked'. Try to improve consistency between various assertions regarding headlocked to be more informative. MFC after: 2 weeks
* tcp_timewait() performs multiple non-atomic reads on the tcptwrwatson2004-11-235-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks
* De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possiblerwatson2004-11-231-15/+11
| | | | | | | | | | | | | | | but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks
* Assert the inpcb lock in tcp_twstart(), which does both read-modify-writerwatson2004-11-232-0/+20
| | | | | | | | | | | | | | | | | | | on the tcpcb, but also calls into tcp_close() and tcp_twrespond(). Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug. Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock. Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock. MFC after: 2 weeks
* Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(),rwatson2004-11-232-0/+8
| | | | | | and tcp_drop(), due to read-modify-write of TCP state variables. MFC after: 2 weeks
* Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lockrwatson2004-11-232-8/+22
| | | | | | | | | | | | protects access to the ISN state variables. Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping. Staticize internal ISN variables since they're not used outside of tcp_subr.c. MFC after: 2 weeks
* Remove "Unlocked read" annotations associated with previously unlockedrwatson2004-11-222-6/+0
| | | | | | | use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock. MFC after: 1 week
* s/send/sent/ in comment describing TCPS_SYN_RECEIVED.rwatson2004-11-211-1/+1
|
* - Since divert protocol is not connection oriented, remove SS_ISCONNECTED flagglebius2004-11-181-33/+0
| | | | | | | | | | | | | from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now. Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month
* Fix host route addition for more than one address to a loopback interfacemlaier2004-11-171-1/+1
| | | | | | | | after allowing more than one address with the same prefix. Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru> Submitted by: ru (also NetBSD rev. 1.83) Pointyhat to: mlaier
* Merge copyright notices.mlaier2004-11-131-28/+1
| | | | Requested by: njl
* Fix ng_ksocket(4) operation as a divert socket, which is pretty usefulglebius2004-11-121-11/+12
| | | | | | | | | | | | | | | | | | | | and has been broken twice: - in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2] A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly. Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week
* Change the way we automatically add prefix routes when adding a new address.mlaier2004-11-121-27/+147
| | | | | | | | | | | | | | | | This makes it possible to have more than one address with the same prefix. The first address added is used for the route. On deletion of an address with IFA_ROUTE set, we try to find a "fallback" address and hand over the route if possible. I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to in_ifscrub as it must be considered KAPI as it is not static in in.c. I will clean this after the MFC. Discussed on: arch, net Tested by: many testers of the CARP patches Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it> Obtained from: WIDE via OpenBSD MFC after: 1 month
* Add missing '='phk2004-11-111-1/+1
| | | | Spotted by: obrien
* Fix a double-free in the 'hlen > m->m_len' sanity check.andre2004-11-091-1/+1
| | | | | Bug report by: <james@towardex.com> MFC after: 2 weeks
* support TCP-MD5(IPv4) in KAME-IPSEC, too.suz2004-11-082-0/+2
| | | | MFC after: 3 week
* Initialize struct pr_userreqs in new/sparse style and fill in commonphk2004-11-084-26/+67
| | | | | | default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
* Do some re-sorting of TCP pcbinfo locking and assertions: make sure torwatson2004-11-072-12/+10
| | | | | | | | | | | retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held. MFC after: 3 weeks
* Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check.andre2004-11-061-2/+2
| | | | | Bug report by: <james@towardex.com> MFC after: 2 weeks
* Hide udp_in6 behind #ifdef INET6phk2004-11-041-0/+2
|
* When performing IP fast forwarding, immediately drop traffic which isbms2004-11-041-0/+6
| | | | | | | | | | | destined for a blackhole route. This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case. Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks
* Until this change, the UDP input code used global variables udp_in,rwatson2004-11-041-57/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks
* Remove RFC1644 T/TCP support from the TCP side of the network stack.andre2004-11-0213-841/+34
| | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
* Correct a bug in TCP SACK that could result in wedging of the TCP stackrwatson2004-10-301-2/+2
| | | | | | | | | | | under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>
* Add a matching tunable for net.inet.tcp.sack.enable sysctl.rwatson2004-10-261-0/+1
|
* Check that rt_mask(rt) is non-NULL before dereferencing it, in thebms2004-10-261-0/+1
| | | | | | RTM_ADD case, thus avoiding a panic. Submitted by: Iasen Kostov
* IPDIVERT is a module now and tell the other parts of the kernel about it.andre2004-10-251-0/+4
| | | | IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.
* For variables that are only checked with defined(), don't provideru2004-10-241-1/+1
| | | | any fake value.
* Shave 40 unused bytes from struct tcpcb.andre2004-10-221-1/+0
|
* When printing the initialization string and IPDIVERT is not compiled into theandre2004-10-221-1/+1
| | | | kernel refer to it as "loadable" instead of "disabled".
* Refuse to unload the ipdivert module unless the 'force' flag is given to ↵andre2004-10-221-1/+11
| | | | | | | kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.
* Destroy the UMA zone on unload.andre2004-10-191-0/+1
|
* Slightly extend the locking during unload to fully cover the protocolandre2004-10-191-5/+6
| | | | | deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.
* Annotate a newly introduced race present due to the unloading ofrwatson2004-10-191-0/+4
| | | | | | | | protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.
* Convert IPDIVERT into a loadable module. This makes use of the dynamic ↵andre2004-10-195-37/+92
| | | | | | | | | | | loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.
* Properly declare the "net.inet" sysctl subtree.andre2004-10-191-0/+1
|
* Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACERandre2004-10-191-0/+6
| | | | | to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.
* Make use of the PROTO_SPACER functionality for dynamically loadableandre2004-10-191-2/+19
| | | | | | | protocols in inetsw[] and define initially eight spacer slots. Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.
* Support for dynamically loadable and unloadable IP protocols in the ipmux.andre2004-10-192-1/+64
| | | | | | | | | | | | | | | With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets. The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.
* Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.andre2004-10-191-0/+1
|
* Make comments more clear. Change the order of one if() statement to check theandre2004-10-191-3/+8
| | | | more likely variable first.
* Push acquisition of the accept mutex out of sofree() into the callerrwatson2004-10-183-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
* Don't release the udbinfo lock until after the last use of UDP inpcbrwatson2004-10-121-3/+3
| | | | | | | | in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.
* Modify the thrilling "%D is using my IP address %s!" message so thatrwatson2004-10-121-1/+7
| | | | | | it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.
OpenPOWER on IntegriCloud