summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* Make sure all uses of stack allocated struct route's are properlyandre2003-11-261-2/+2
| | | | | | | | zeroed. Doing a bzero on the entire struct route is not more expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst. Reviewed by: sam (mentor) Approved by: re (scottl)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-223/+125
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* o correct locking problem: the inpcb must be held across tcp_respondsam2003-11-081-15/+20
| | | | | | | o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond Supported by: FreeBSD Foundation
* Add an additional check to the tcp_twrecycleable function; I hadsilby2003-11-021-3/+16
| | | | | | | | | previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well. The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.
* - Add a new function tcp_twrecycleable, which tells us if the ISN whichsilby2003-11-011-0/+19
| | | | | | | | | | | | | we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled. - Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed. This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.
* Reduce the number of tcp time_wait structs to maxsockets / 5; this ensuressilby2003-10-241-1/+1
| | | | | | | | | | | | that at most 20% of sockets can be in time_wait at one time, ensuring that time_wait sockets do not starve real connections from inpcb structures. No implementation change is needed, jlemon already implemented a nice LRU-ish algorithm for tcp_tw structure recycling. This should reduce the need for sysadmins to lower the default msl on busy servers.
* Change all SYSCTLS which are readonly and have a related TUNABLEsilby2003-10-211-1/+1
| | | | | from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
* Fix a bunch of off-by-one errors in the range checking code.ru2003-09-111-2/+2
|
* Introduce two new MAC Framework and MAC policy entry points:rwatson2003-08-211-3/+3
| | | | | | | | | | | | | | mac_reflect_mbuf_icmp() mac_reflect_mbuf_tcp() These entry points permit MAC policies to do "update in place" changes to the labels on ICMP and TCP mbuf headers when an ICMP or TCP response is generated to a packet outside of the context of an existing socket. For example, in respond to a ping or a RST packet to a SYN on a closed port. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Correct a bug introduced with reduced TCP state handling; makerwatson2003-05-071-3/+18
| | | | | | | | | | | | | | | | | | | sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to. Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much. Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Remove a potential panic condition introduced by reduced TCP waitrwatson2003-04-101-5/+15
| | | | | | | | | | | | | | | | | | state. Those changed attempted to work around the changed invariant that inp->in_socket was sometimes now NULL, but the logic wasn't quite right, meaning that inp->in_socket would be dereferenced by cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC were in use. Attempt to clarify and correct the logic. Note: the work-around originally introduced with the reduced TCP wait state handling to use cr_cansee() instead of cr_canseesocket() in this case isn't really right, although it "Does the right thing" for most of the cases in the base system. We'll need to address this at some point in the future. Pointed out by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Remove a panic(); if the zone allocator can't provide more timewaitjlemon2003-03-081-20/+19
| | | | | | | structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer. Sponsored by: DARPA, NAI Labs
* More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).des2003-03-021-1/+1
|
* When generating a TCP response to a connection, not only test if therwatson2003-02-251-1/+1
| | | | | | | | | | tcpcb is NULL, but also its connected inpcb, since we now allow elements of a TCP connection to hang around after other state, such as the socket, has been recycled. Tested by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* - m = m_gethdr(M_NOWAIT, MT_HEADER);phk2003-02-211-1/+1
| | | | | | + m = m_gethdr(M_DONTWAIT, MT_HEADER); 'nuff said.
* Unbreak non-IPV6 compilation.jlemon2003-02-191-4/+10
| | | | | Caught by: phk Sponsored by: DARPA, NAI Labs
* Add a TCP TIMEWAIT state which uses less space than a fullblown TCPjlemon2003-02-191-48/+268
| | | | | | | | control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket. Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
* Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so thejlemon2003-02-191-35/+32
| | | | | | | | routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update. Sponsored by: DARPA, NAI Labs
* Back out M_* changes, per decision of the TRB.imp2003-02-191-4/+4
| | | | Approved by: trb
* Take advantage of pre-existing lock-free synchronization and type stable memoryhsu2003-02-151-4/+3
| | | | to avoid acquiring SMP locks during expensive copyout process.
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-4/+4
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Validate inp to prevent an use after free.hsu2002-12-241-1/+2
|
* Change tcp.inflight_min from 1024 to a production default of 6144. Createdillon2002-12-141-4/+14
| | | | | | | a sysctl for the stabilization value for the bandwidth delay product (inflight) algorithm and document it. MFC after: 3 days
* Fix two instances of variant struct definitions in sys/netinet:phk2002-10-201-4/+4
| | | | | | | | | | | | | | Remove the never completed _IP_VHL version, it has not caught on anywhere and it would make us incompatible with other BSD netstacks to retain this version. Add a CTASSERT protecting sizeof(struct ip) == 20. Don't let the size of struct ipq depend on the IPDIVERT option. This is a functional no-op commit. Approved by: re
* Tie new "Fast IPsec" code into the build. This involves the usualsam2002-10-161-0/+8
| | | | | | | | | | | | configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation). As noted previously, don't use FAST_IPSEC with INET6 at the moment. Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
* Replace aux mbufs with packet tags:sam2002-10-161-8/+3
| | | | | | | | | | | | | | | | | | | o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
* turn off debugging by default if bandwidth delay product limiting isdillon2002-10-101-1/+1
| | | | turned on (it is already off in -stable).
* Correct bug in t_bw_rtttime rollover, #undef USERTTdillon2002-08-241-1/+5
|
* Implement TCP bandwidth delay product window limiting, similar to (butdillon2002-08-171-0/+158
| | | | | | | | | | | | not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week
* Document the undocumented assumption that at least one of the PCBrwatson2002-08-011-0/+2
| | | | | | | | | pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code. Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs
* Introduce support for Mandatory Access Control and extensiblerwatson2002-07-311-0/+17
| | | | | | | | | | | | | | | | | | kernel access control. Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Wire the sysctl output buffer before grabbing any locks to preventtruckman2002-07-281-0/+3
| | | | | | | SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
* Introduce two new sysctl's:dillon2002-07-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net.inet.tcp.rexmit_min (default 3 ticks equiv) This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only. net.inet.tcp.rexmit_slop (default 200ms) This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues. Note that the *original* code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant. Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.
* Defer calling SYSCTL_OUT() until after the locks have been released.truckman2002-07-111-2/+4
|
* Reduce the nesting level of a code block that doesn't need to be intruckman2002-07-111-13/+10
| | | | an else clause.
* Extend the effect of the sysctl net.inet.tcp.icmp_may_rstjesper2002-06-301-1/+1
| | | | | | | | so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection. MFC after: 2 weeks
* TCP notify functions can change the pcb list.hsu2002-06-211-2/+2
|
* Notify functions can destroy the pcb, so they have to return anhsu2002-06-141-15/+24
| | | | | | | | indication of whether this happenned so the calling function knows whether or not to unlock the pcb. Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
* Fix logic which resulted in missing a call to INP_UNLOCK().hsu2002-06-121-5/+2
|
* Lock up inpcb.hsu2002-06-101-11/+53
| | | | Submitted by: Jennifer Yang <yangjihui@yahoo.com>
* Back out my lats commit of locking down a socket, it conflicts with hsu's work.tanimura2002-05-311-13/+1
| | | | Requested by: hsu
* Lock down a socket, milestone 1.tanimura2002-05-201-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
* Remove some ISN generation code which has been unused since thesilby2002-04-101-27/+3
| | | | | | syncache went in. MFC after: 3 days
* Change the suser() API to take advantage of td_ucred as well as do ajhb2002-04-011-2/+2
| | | | | | | | | | | | general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
* Merge from TrustedBSD MAC branch:rwatson2002-03-221-4/+4
| | | | | | | | | | | | | | Move the network code from using cr_cansee() to check whether a socket is visible to a requesting credential to using a new function, cr_canseesocket(), which accepts a subject credential and object socket. Implement cr_canseesocket() so that it does a prison check, a uid check, and add a comment where shortly a MAC hook will go. This will allow MAC policies to seperately instrument the visibility of sockets from the visibility of processes. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* Switch vm_zone.h with uma.h. Change over to uma interfaces.jeff2002-03-201-3/+4
|
* Remove __P.alfred2002-03-191-4/+4
|
* Simple p_ucred -> td_ucred changes to start using the per-thread ucredjhb2002-02-271-3/+3
| | | | reference.
* More IPV6 const fixes.alfred2002-02-271-1/+1
|
* Introduce a version field to `struct xucred' in place of one of thedd2002-02-271-10/+2
| | | | | | | | | | | | spares (the size of the field was changed from u_short to u_int to reflect what it really ends up being). Accordingly, change users of xucred to set and check this field as appropriate. In the kernel, this is being done inside the new cru2x() routine which takes a `struct ucred' and fills out a `struct xucred' according to the former. This also has the pleasant sideaffect of removing some duplicate code. Reviewed by: rwatson
OpenPOWER on IntegriCloud