summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_usrreq.c
Commit message (Collapse)AuthorAgeFilesLines
* KSE Milestone 2julian2001-09-121-35/+35
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Much delayed but now present: RFC 1948 style sequence numberssilby2001-08-221-2/+2
| | | | | | | | | | In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present. Reviewed by: jesper
* move ipsec security policy allocation into in_pcballoc, beforeume2001-07-261-12/+0
| | | | | | | | making pcbs available to the outside world. otherwise, we will see inpcb without ipsec security policy attached (-> panic() in ipsec.c). Obtained from: KAME MFC after: 3 days
* Bump net.inet.tcp.sendspace to 32k and net.inet.tcp.recvspace to 65k.obrien2001-07-131-2/+2
| | | | | | | | | | | | | This should help us in nieve benchmark "tests". It seems a wide number of people think 32k buffers would not cause major issues, and is in fact in use by many other OS's at this time. The receive buffers can be bumped higher as buffers are hardly used and several research papers indicate that receive buffers rarely use much space at all. Submitted by: Leo Bicknell <bicknell@ufp.org> <20010713101107.B9559@ussenterprise.ufp.org> Agreed to in principle by: dillon (at the 32k level)
* Temporary feature: Runtime tuneable tcp initial sequence numbersilby2001-07-081-2/+2
| | | | | | | | | | | | | | | | | | generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method. While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks. To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default. Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed. Reviewed by: jlemon Tested by: numerous subscribers of -net
* Eliminate the allocation of a tcp template structure for eachsilby2001-06-231-12/+0
| | | | | | | | | | | | connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed. Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs. Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
* Sync with recent KAME.ume2001-06-111-5/+8
| | | | | | | | | | | | | | | | | | This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
* Say goodbye to TCP_COMPAT_42jesper2001-04-201-9/+0
| | | | | Reviewed by: wollman Requested by: wollman
* Randomize the TCP initial sequence numbers more thoroughly.kris2001-04-171-1/+10
| | | | | Obtained from: OpenBSD Reviewed by: jesper, peter, -developers
* Unbreak LINT.jlemon2001-03-121-5/+17
| | | | Pointed out by: phk
* Push the test for a disconnected socket when accept()ing down to thejlemon2001-03-091-0/+8
| | | | | protocol layer. Not all protocols behave identically. This fixes the brokenness observed with unix-domain sockets (and postfix)
* o Move per-process jail pointer (p->pr_prison) to inside of the subjectrwatson2001-02-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
* When turning off TCP_NOPUSH, call tcp_output to immediately flushjlemon2001-02-021-4/+14
| | | | | | out any data pending in the buffer. Submitted by: Tony Finch <dot@dotat.at>
* Support per socket based IPv4 mapped IPv6 addr enable/disable control.shin2000-04-011-4/+3
| | | | Submitted by: ume
* tcp updates to support IPv6.shin2000-01-091-1/+287
| | | | | | | also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* IPSEC support in the kernel.shin1999-12-221-0/+12
| | | | | | | | pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* Always set INP_IPV4 flag for IPv4 pcb entries, because netstat needs itshin1999-12-131-3/+0
| | | | | | | | | | | to print out protocol specific pcb info. A patch submitted by guido@gvr.org, and asmodai@wxs.nl also reported the problem. Thanks and sorry for your troubles. Submitted by: guido@gvr.org Reviewed by: shin
* udp IPv6 support, IPv6/IPv4 tunneling support in kernel,shin1999-12-071-1/+5
| | | | | | | | | | packet divert at kernel for IPv6/IPv4 translater daemon This includes queue related patch submitted by jburkhol@home.com. Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
* Fix a warning and a potential panic if TCPDEBUG is active. (tp ispeter1999-11-181-0/+2
| | | | a wild pointer and used by TCPDEBUG2())
* Restructure TCP timeout handling:jlemon1999-08-301-3/+4
| | | | | | | | | | - eliminate the fast/slow timeout lists for TCP and instead use a callout entry for each timer. - increase the TCP timer granularity to HZ - implement "bad retransmit" recovery, as presented in "On Estimating End-to-End Network Path Properties", by Allman and Paxson. Submitted by: jlemon, wollmann
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expectedpeter1999-06-041-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | to either enqueue or free their mbuf chains, but tcp_usr_send() was dropping them on the floor if the tcpcb/inpcb has been torn down in the middle of a send/write attempt. This has been responsible for a wide variety of mbuf leak patterns, ranging from slow gradual leakage to rather rapid exhaustion. This has been a problem since before 2.2 was branched and appears to have been fixed in rev 1.16 and lost in 1.23/1.28. Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking (extensively) into this on a live production 2.2.x system and that it was the actual cause of the leak and looks like it fixes it. The machine in question was loosing (from memory) about 150 mbufs per hour under load and a change similar to this stopped it. (Don't blame Jayanth for this patch though) An alternative approach to this would be to recheck SS_CANTSENDMORE etc inside the splnet() right before calling pru_send() after all the potential sleeps, interrupts and delays have happened. However, this would mean exposing knowledge of the tcp stack's reset handling and removal of the pcb to the generic code. There are other things that call pru_send() directly though. Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
* Add sysctl descriptions to many SYSCTL_XXXsbillf1999-05-031-5/+5
| | | | | | | PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
* This Implements the mumbled about "Jail" feature.phk1999-04-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
* so_linger is in seconds, not in 1/HZache1999-04-241-2/+2
| | | | | PR: 11252 Submitted by: Martin Kammerhofer <dada@sbox.tu-graz.ac.at>
* Add a flag, passed to pru_send routines, PRUS_MORETOCOME. Thisfenner1999-01-201-2/+7
| | | | | | | | | flag means that there is more data to be put into the socket buffer. Use it in TCP to reduce the interaction between mbuf sizes and the Nagle algorithm. Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's fix for this problem.
* The "easy" fixes for compiling the kernel -Wunused: remove unreferenced staticarchie1998-12-071-2/+1
| | | | and local variables, goto labels, and functions declared but not defined.
* Yow! Completely change the way socket options are handled, eliminatingwollman1998-08-231-57/+60
| | | | | | another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
* Improved connection establishment performance by doing local port lookups viadg1998-01-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a hashed port list. In the new scheme, in_pcblookup() goes away and is replaced by a new routine, in_pcblookup_local() for doing the local port check. Note that this implementation is space inefficient in that the PCB struct is now too large to fit into 128 bytes. I might deal with this in the future by using the new zone allocator, but I wanted these changes to be extensively tested in their current form first. Also: 1) Fixed off-by-one errors in the port lookup loops in in_pcbbind(). 2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash() to do the initialial hash insertion. 3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability. 4) Added a new routine, in_pcbremlists() to remove the PCB from the various hash lists. 5) Added/deleted comments where appropriate. 6) Removed unnecessary splnet() locking. In general, the PCB functions should be called at splnet()...there are unfortunately a few exceptions, however. 7) Reorganized a few structs for better cache line behavior. 8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in the future, however. These changes have been tested on wcarchive for more than a month. In tests done here, connection establishment overhead is reduced by more than 50 times, thus getting rid of one of the major networking scalability problems. Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult. WARNING: Anything that knows about inpcb and tcpcb structs will have to be recompiled; at the very least, this includes netstat(1).
* Fixed a missing splx(s) bug in tcp_usr_send().dg1997-12-181-2/+3
|
* Make TCPDEBUG a new-style option.joerg1997-09-161-1/+3
|
* Update network code to use poll support.peter1997-09-141-2/+2
|
* Fix all areas of the system (or at least all those in LINT) to avoid storingwollman1997-08-161-17/+14
| | | | | | | | socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
* Removed unused #includes.bde1997-08-021-6/+1
|
* The long-awaited mega-massive-network-code- cleanup. Part I.wollman1997-04-271-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in. 2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials. 3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed. 4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family. 5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem. As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
* Fix potential crash where a user attempts to perform an impliedwollman1997-02-211-1/+14
| | | | | | | | connect in TCP while sending urgent data. It is not clear what purpose is served by doing this, but there's no good reason why it shouldn't work. Submitted by: tjevans@raleigh.ibm.com via wpaul
* Convert raw IP from mondo-switch-statement-from-Hell towollman1997-02-181-55/+6
| | | | | | | | | pr_usrreqs. Collapse duplicates with udp_usrreq.c and tcp_usrreq.c (calling the generic routines in uipc_socket2.c and in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached socket now traps, rather than harmlessly returning an error; this should never happen. Allow the raw IP buffer sizes to be controlled via sysctl.
* Fix the mechanism for choosing wehether to save the slow-start thresholdwollman1997-02-141-317/+0
| | | | | | | | | | in the route. This allows us to remove the unconditional setting of the pipesize in the route, which should mean that SO_SNDBUF and SO_RCVBUF should actually work again. While we're at it: - Convert udp_usrreq from `mondo switch statement from Hell' to new-style. - Delete old TCP mondo switch statement from Hell, which had previously been diked out.
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* Improved in_pcblookuphash() to support wildcarding, and changed relaventdg1996-10-071-2/+2
| | | | | | | | | | callers of it to take advantage of this. This reduces new connection request overhead in the face of a large number of PCBs in the system. Thanks to David Filo <filo@yahoo.com> for suggesting this and providing a sample implementation (which wasn't used, but showed that it could be done). Reviewed by: wollman
* Make the misnamed tcp initial keepalive timer value (which is really thepst1996-09-131-2/+2
| | | | | | | | time, in seconds, that state for non-established TCP sessions stays about) a sysctl modifyable variable. [part 1 of two commits, I just realized I can't play with the indices as I was typing this commit message.]
* Fixed two bugs in previous commit: be sure to include tcp_debug.h whendg1996-07-121-2/+2
| | | | TCPDEBUG is defined, and fix typo in TCPDEBUG2() macro.
* Modify the kernel to use the new pr_usrreqs interface rather than the oldwollman1996-07-111-1/+428
| | | | | | | | | | | | | | pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
* Move or add #include <queue.h> in preparation for upcoming struct socketdg1996-03-111-1/+2
| | | | changes.
* Removed unnecessary #includes of vm stuff. Most of them were oncebde1995-12-061-2/+1
| | | | | | | prerequisites for <sys/sysctl.h>. subr_prof.c: Also replaced #include of <sys/user.h> by #include of <sys/resourcevar.h>.
* New style sysctl & staticize alot of stuff.phk1995-11-141-53/+11
|
* Start adding new style sysctl here too.phk1995-11-091-2/+6
|
* Fix a logical error in T/TCP: when we actively open a connection, weolah1995-11-031-1/+20
| | | | | | | | | | have to decide whether to send a CC or CCnew option in our SYN segment depending on the contents of our TAO cache. This decision has to be made once when the connection starts. The earlier code delayed this decision until the segment was assembled in tcp_output() and retransmitted SYN segments could have different CC options. Reviewed by: Richard Stevens, davidg, wollman
* Start the 2MSL timer when the socket is closed and the TCP connection isolah1995-10-291-2/+6
| | | | | | | | | in the FIN_WAIT_2 state in order to prevent the conn. hanging there forever. Reviewed by: davidg, olah Submitted by: Arne Henrik Juul <arnej@imf.unit.no> Obtained from: bugs@netbsd.org
* Don't leak mbufs in an unusual error case in tcp_usrreq().wollman1995-09-131-2/+14
| | | | | Reviewed by: Andras Olah <olah@freebsd.org> Obtained from: Lite-2
OpenPOWER on IntegriCloud