summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_output.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Brucification.bms2004-02-131-3/+3
| | | | Submitted by: bde
* style(9) pass; whitespace and comments.bms2004-02-121-6/+4
| | | | Submitted by: njl
* Fix a typo; left out preprocessor conditional for sigoff variable, whichbms2004-02-111-0/+2
| | | | | | is only used by TCP_SIGNATURE code. Noticed by: Roop Nanuwa
* Initial import of RFC 2385 (TCP-MD5) digest support.bms2004-02-111-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
* pass pcb rather than so. it is expected that per socket policyume2004-02-031-7/+0
| | | | works again.
* Split the overloaded variable 'win' into two for their specific purposes:andre2004-01-221-21/+22
| | | | | | | | recwin and sendwin. This removes a big source of confusion and makes following the code much easier. Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-26/+16
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* replace mtx_assert by INP_LOCK_ASSERTsam2003-11-081-3/+1
| | | | Supported by: FreeBSD Foundation
* unbreak compilation of FAST_IPSECsam2003-11-081-1/+1
| | | | Supported by: FreeBSD Foundation
* - cleanup SP refcnt issue.ume2003-11-041-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec_*_policy -> ipsec_*_pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct secpolicy *) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets. Tested by: nork Obtained from: KAME
* The tcp_trace call needs the length of the header. Unfortunately theharti2003-08-131-1/+5
| | | | | | | | | code has rotten a bit so that the header length is not correct at the point when tcp_trace is called. Temporarily compute the correct value before the call and restore the old value after. This makes ports/benchmarks/dbs to almost work. This is a NOP unless you compile with TCPDEBUG.
* Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so thejlemon2003-02-191-9/+4
| | | | | | | | routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs
* Clean up delayed acks and T/TCP interactions:jlemon2003-02-191-3/+4
| | | | | | | | - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending() Sponsored by: DARPA, NAI Labs
* Back out M_* changes, per decision of the TRB.imp2003-02-191-3/+3
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-3/+3
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Optimize away call to bzero() in the common case by directly checkinghsu2003-01-181-6/+3
| | | | if a connection has any cached TAO information.
* Fix oops in my last commit, I was calculating a new length but then notdillon2002-10-161-1/+1
| | | | | | using it. (The code is already correct in -stable). Found by: silby
* Tie new "Fast IPsec" code into the build. This involves the usualsam2002-10-161-0/+5
| | | | | | | | | | | | configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
* Replace aux mbufs with packet tags:sam2002-10-161-12/+3
| | | | | | | | | | | | | | | | | | | o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
* Update various comments mainly related to retransmit/FIN that Idillon2002-10-101-6/+36
| | | | | | | | documented while working on a previous bug. Fix a PERSIST bug. Properly account for a FIN sent during a PERSIST. MFC after: 7 days
* Tempary fix for inet6. The final fix is to change in6_pcbnotify to take ↵jennifer2002-09-171-0/+2
| | | | | | pcbinfo instead of pcbhead. It is on the way.
* Implement TCP bandwidth delay product window limiting, similar to (butdillon2002-08-171-0/+1
| | | | | | | | | | | | not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week
* Assert that the inpcb lock is held when calling tcp_output().jennifer2002-08-121-0/+2
| | | | Approved by: hsu
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+5
| | | | | | | | | | | | | | | | | | kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Slightly restructure the #ifdef INET6 sections to make the codeluigi2002-06-231-31/+19
| | | | | | | more readable. Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.
* Re-commit w/fix:silby2002-06-141-1/+1
| | | | | | | | | | | Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.
* Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)silby2002-06-141-1/+1
| | | | Pointy-hat to: silby
* Ensure that the syn cache's syn-ack packets contain the samesilby2002-06-141-1/+1
| | | | | | | ip_tos, ip_ttl, and DF bits as all other tcp packets. PR: 39141 MFC after: 2 weeks
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.tanimura2002-05-311-14/+3
| | | | Requested by: hsu
* Lock down a socket, milestone 1.tanimura2002-05-201-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
* Reduce the local network slowstart flightsize from infinity to 4 packets.silby2001-12-141-1/+1
| | | | | | | | | | Now that we've increased the size of our send / receive buffers, bursting an entire window onto the network may cause congestion. As a result, we will slow start beginning with a flightsize of 4 packets. Problem reported by: Thomas Zenker <thz@Lennartz-electronic.de> MFC after: 3 days
* Fix up tabs from cut&n&paste.jlemon2001-12-131-8/+8
|
* Update to C99, s/__FUNCTION__/__func__/,obrien2001-12-101-1/+1
| | | | also don't use ANSI string concatenation.
* Fix a bug with transmitter restart after receiving a 0 window. Thedillon2001-12-021-14/+40
| | | | | | | | | | | | | receiver was not sending an immediate ack with delayed acks turned on when the input buffer is drained, preventing the transmitter from restarting immediately. Propogate the TCP_NODELAY option to accept()ed sockets. (Helps tbench and is a good idea anyway). Some cleanup. Identify additonal issues in comments. MFC after: 1 day
* The transmit burst limit for newreno completely breaks TCP's performancedillon2001-11-301-0/+10
| | | | | | if the receive side is using delayed acks. Temporarily remove it. MFC after: 0 days
* Introduce a syncache, which enables FreeBSD to withstand a SYN floodjlemon2001-11-221-1/+1
| | | | | | | DoS in an improved fashion over the existing code. Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs
* Add a flag TF_LASTIDLE, that forces a previously idle connectionjayanth2001-10-051-1/+8
| | | | | | | | | | | to send all its data, especially when the data is less than one MSS. This fixes an issue where the stack was delaying the sending of data, eventhough there was enough window to send all the data and the sending of data was emptying the socket buffer. Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp) Submitted by: Jayanth Vijayaraghavan
* Eliminate the allocation of a tcp template structure for eachsilby2001-06-231-10/+2
| | | | | | | | | | | | connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
* Sync with recent KAME.ume2001-06-111-2/+6
| | | | | | | | | | | | | | | | | | This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-2/+4
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* Convert all users of fldoff() to offsetof(). fldoff() is badphk2000-10-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should *never* include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
* Don't do snd_nxt rollback optimization (rev. 1.46) for SYN packets.archie2000-09-111-3/+2
| | | | | | | | It causes a panic when/if snd_una is incremented elsewhere (this is a conservative change, because originally no rollback occurred for any packets at all). Submitted by: Vivek Sadananda Pai <vivek@imimic.com>
* Improve performance in the case where ip_output() returns an error.archie2000-08-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | When this happens, we know for sure that the packet data was not received by the peer. Therefore, back out any advancing of the transmit sequence number so that we send the same data the next time we transmit a packet, avoiding a guaranteed missed packet and its resulting TCP transmit slowdown. In most systems ip_output() probably never returns an error, and so this problem is never seen. However, it is more likely to occur with device drivers having short output queues (causing ENOBUFS to be returned when they are full), not to mention low memory situations. Moreover, because of this problem writers of slow devices were required to make an unfortunate choice between (a) having a relatively short output queue (with low latency but low TCP bandwidth because of this problem) or (b) a long output queue (with high latency and high TCP bandwidth). In my particular application (ISDN) it took an output queue equal to ~5 seconds of transmission to avoid ENOBUFS. A more reasonable output queue of 0.5 seconds resulted in only about 50% TCP throughput. With this patch full throughput was restored in the latter case. Reviewed by: freebsd-net
* re-enable the tcp newreno code.jayanth2000-07-121-1/+1
|
* sync with kame tree as of july00. tons of bug fixes/improvements.itojun2000-07-041-17/+24
| | | | | | | API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
* When attempting to transmit a packet, if the system fails to allocatejlemon2000-06-021-0/+4
| | | | | | | | | | | | | a mbuf, it may return without setting any timers. If no more data is scheduled to be transmitted (this was a FIN) the system will sit in LAST_ACK state forever. Thus, when mbuf allocation fails, set the retransmit timer if neither the retransmit or persist timer is already pending. Problem discovered by: Mike Silbersack (silby@silby.com) Pushed for a fix by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: jayanth
* Temporarily turn off the newreno flag until we can track down the knownjayanth2000-05-111-1/+1
| | | | data corruption problem.
* Implement TCP NewReno, as documented in RFC 2582. This allowsjlemon2000-05-061-3/+7
| | | | | | | | better recovery for multiple packet losses in a single window. The algorithm can be toggled via the sysctl net.inet.tcp.newreno, which defaults to "on". Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
* Add support for offloading IP/TCP/UDP checksums to NIC hardware whichjlemon2000-03-271-8/+11
| | | | supports them.
OpenPOWER on IntegriCloud