summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* When aborting tcp_attach() due to a problem allocating or attaching therwatson2005-06-011-0/+2
| | | | | | | tcpcb, lock the inpcb before calling in_pcbdetach() or in6_pcbdetach(), as they expect the inpcb to be passed locked. MFC after: 7 days
* Assert tcbinfo lock, inpcb lock in tcp_disconnect().rwatson2005-06-011-1/+8
| | | | | | Assert tcbinfo lock, inpcb lock in in tcp_usrclosed(). MFC after: 7 days
* Assert tcbinfo lock in tcp_attach(), as it is required; the callerrwatson2005-06-011-0/+2
| | | | | | (tcp_usr_attach()) currently grabs it. MFC after: 7 days
* Replace t_force with a t_flag (TF_FORCEDATA).ps2005-05-211-2/+2
| | | | | Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman.
* Remove now unused inirw variable from previous use of COMMON_END().rwatson2005-05-011-1/+0
| | | | Reported by: csjp
* Fix typo in last commit.grehan2005-05-011-1/+1
| | | | Approved by: rwatson
* Slide unlocking of the tcbinfo lock earlier in tcp_usr_send(), as it'srwatson2005-05-011-2/+13
| | | | | | | | | | | needed only for implicit connect cases. Under load, especially on SMP, this can greatly reduce contention on the tcbinfo lock. NB: Ambiguities about the state of so_pcb need to be resolved so that all use of the tcbinfo lock in non-implicit connection cases can be eliminated. Submited by: Kazuaki Oda <kaakun at highway dot ne dot jp>
* eliminate extraneous null ptr checkssam2005-03-291-1/+1
| | | | Noticed by: Coverity Prevent analysis tool
* In tcp_usr_send(), broaden coverage of the socket buffer lock in therwatson2005-03-141-1/+4
| | | | | | | non-OOB case so that the sbspace() check is performed under the same lock instance as the append to the send socket buffer. MFC after: 1 week
* In the current world order, solisten() implements the state transition ofrwatson2005-02-211-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set. This change does the following: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto(). - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code. Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn
* o Add handling of an IPv4-mapped IPv6 address.maxim2005-02-141-87/+0
| | | | | | | | | | | | | o Use SYSCTL_IN() macro instead of direct call of copyin(9). Submitted by: ume o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where most of tcp sysctls live. o There are net.inet[6].tcp[6].getcred sysctls already, no needs in a separate struct tcp_ident_mapping. Suggested by: ume
* o Implement net.inet.tcp.drop sysctl and userland part, tcpdrop(8)maxim2005-02-061-0/+86
| | | | | | | | | | | | utility: The tcpdrop command drops the TCP connection specified by the local address laddr, port lport and the foreign address faddr, port fport. Obtained from: OpenBSD Reviewed by: rwatson (locking), ru (man page), -current MFC after: 1 month
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Do export the advertised receive window via the tcpi_rcv_space field ofrwatson2004-11-271-0/+1
| | | | struct tcp_info.
* Implement parts of the TCP_INFO socket option as found in Linux 2.6.rwatson2004-11-261-2/+54
| | | | | | | | | | | | | | | This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.
* Initialize struct pr_userreqs in new/sparse style and fill in commonphk2004-11-081-11/+32
| | | | | | default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.
* Remove RFC1644 T/TCP support from the TCP side of the network stack.andre2004-11-021-69/+4
| | | | | | | | | | | | | | | | A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch
* White space cleanup for netinet before branch:rwatson2004-08-161-9/+9
| | | | | | | | | | | - Trailing tab/space cleanup - Remove spurious spaces between or before tabs This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET. Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
* Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSDdwmalone2004-08-141-6/+1
| | | | | | | | | | | | | | | | | | | | | have already done this, so I have styled the patch on their work: 1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID. 2) named the sysctl net.inet.ip.random_id 3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns. The sysctl defaults to 0 (sequential IP IDs). Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
* compare pointer against NULL, not 0jmg2004-07-261-2/+2
| | | | | | | | when inpcb is NULL, this is no longer invalid since jlemon added the tcp_twstart function... this prevents close "failing" w/ EINVAL when it really was successful... Reviewed by: jeremy (NetBSD)
* when IN6P_AUTOFLOWLABEL is set, the flowlabel is not set onume2004-07-161-2/+10
| | | | | | | | outgoing tcp connections. Reported by: Orla McGann <orly@cnri.dit.ie> Reviewed by: Orla McGann <orly@cnri.dit.ie> Obtained from: KAME
* Remove spl's from TCP protocol entry points. While not all lockingrwatson2004-06-261-32/+1
| | | | | is merged here yet, this will ease the merge process by bringing the locked and unlocked versions into sync.
* In tcp_ctloutput(), don't hold the inpcb lock over a call torwatson2004-06-181-1/+1
| | | | | | | | ip_ctloutput(), as it may need to perform blocking memory allocations. This also improves consistency with locking relative to other points that call into ip_ctloutput(). Bumped into by: Grover Lines <grover@ceribus.net>
* The socket field so_state is used to hold a variety of socket relatedrwatson2004-06-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
* Remove advertising clause from University of California Regent'simp2004-04-071-4/+0
| | | | | | | license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
* Fix a panic possibility caused by returning without releasing locks.pjd2004-04-041-37/+26
| | | | | | | | | | | | It was fixed by moving problemetic checks, as well as checks that doesn't need locking before locks are acquired. Submitted by: Ryan Sommers <ryans@gamersimpact.com> In co-operation with: cperciva, maxim, mlaier, sam Tested by: submitter (previous patch), me (current patch) Reviewed by: cperciva, mlaier (previous patch), sam (current patch) Approved by: sam Dedicated to: enough!
* Remove unused argument.pjd2004-03-281-4/+3
|
* Reduce 'td' argument to 'cred' (struct ucred) argument in those functions:pjd2004-03-271-8/+9
| | | | | | | | | | | | | | - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson Requested by: rwatson Reviewed by: rwatson, ume
* Remove unused argument.pjd2004-03-271-1/+1
| | | | Reviewed by: ume
* Shorten the name of the socket option used to enable TCP-MD5 packetbms2004-02-161-2/+2
| | | | | | treatment. Submitted by: Vincent Jardin
* Brucification.bms2004-02-131-1/+1
| | | | Submitted by: bde
* Initial import of RFC 2385 (TCP-MD5) digest support.bms2004-02-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC. For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence. Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB. There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity. Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem. This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment. Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request. Sponsored by: sentex.net
* Check that sa_len is the appropriate value in tcp_usr_bind(),truckman2004-01-101-0/+8
| | | | | | | | | | tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking to see whether the address is multicast so that the proper errno value will be returned if sa_len is incorrect. The checks are identical to the ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are called after the multicast address check passes. MFC after: 30 days
* Limiters and sanity checks for TCP MSS (maximum segement size)andre2004-01-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resource exhaustion attacks. For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU. The resource exhaustion works in two ways: o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one. For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path. This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP). We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets. o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process. For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches). This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made. We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped. MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day
* Split the "inp" mutex class into separate classes for each of divert,sam2003-11-261-1/+1
| | | | | | | | raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints. Reviewed by: rwatson Approved by: re (rwatson)
* Introduce tcp_hostcache and remove the tcp specific metrics fromandre2003-11-201-20/+25
| | | | | | | | | | | | | | | | | | | | | | | the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache. It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve. tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address. It removes significant locking requirements from the tcp stack with regard to the routing table. Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
* Introduce a MAC label reference in 'struct inpcb', which cachesrwatson2003-11-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer. This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check. For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update. Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy. Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* speedup stream socket recv handling by tracking the tail ofsam2003-10-281-2/+2
| | | | | | | the mbuf chain instead of walking the list for each append Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)
* Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure.jlemon2003-03-081-13/+15
| | | | Sponsored by: DARPA, NAI Labs
* Hold the TCP protocol lock while modifying the connection hash table.hsu2003-02-251-4/+4
|
* Unbreak the automatic remapping of an INADDR_ANY destination addressiedowse2002-10-241-5/+4
| | | | | | | | | | | | | | | to the primary local IP address when doing a TCP connect(). The tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr) modifying the passed-in sockaddr, and I failed to notice this in the recent change that added in_pcbconnect_setup(). As a result, tcp_connect() was ending up using the unmodified sockaddr address instead of the munged version. There are two cases to handle: if in_pcbconnect_setup() succeeds, then the PCB has already been updated with the correct destination address as we pass it pointers to inp_faddr and inp_fport directly. If in_pcbconnect_setup() fails due to an existing but dead connection, then copy the destination address from the old connection.
* Replace in_pcbladdr() with a more generic inner subroutine foriedowse2002-10-211-14/+12
| | | | | | | | | | | | | | | in_pcbconnect() called in_pcbconnect_setup(). This version performs all of the functions of in_pcbconnect() except for the final committing of changes to the PCB. In the case of an EADDRINUSE error it can also provide to the caller the PCB of the duplicate connection, avoiding an extra in_pcblookup_hash() lookup in tcp_connect(). This change will allow the "temporary connect" hack in udp_output() to be removed and is part of the preparation for adding the IP_SENDSRCADDR control message. Discussed on: -net Approved by: re
* Replace (ab)uses of "NULL" where "0" is really meant.archie2002-08-221-2/+2
|
* Create new functions in_sockaddr(), in6_sockaddr(), andtruckman2002-08-211-20/+43
| | | | | | | | | | | | | | | | | in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked. Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept(). Reviewed by: suz
* Implement TCP bandwidth delay product window limiting, similar to (butdillon2002-08-171-0/+2
| | | | | | | | | | | | not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'. net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit MFC after: 1 week
* Use a common way to release locks before exit.maxim2002-07-291-2/+4
| | | | Reviewed by: hsu
* make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.ume2002-07-251-3/+3
| | | | MFC after: 1 week
* cleanup usage of ip6_mapped_addr_on and ip6_v6only. now,ume2002-07-251-5/+3
| | | | | | ip6_mapped_addr_on is unified into ip6_v6only. MFC after: 1 week
* Because we're holding an exclusive write lock on the head, references tohsu2002-06-131-3/+0
| | | | the new inp cannot leak out even though it has been placed on the head list.
* Lock up inpcb.hsu2002-06-101-37/+161
| | | | Submitted by: Jennifer Yang <yangjihui@yahoo.com>
OpenPOWER on IntegriCloud