summaryrefslogtreecommitdiffstats
path: root/sys/netinet/tcp_syncache.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Normalize TCP syncache-related MAC Framework entry points to match mostrwatson2007-10-251-6/+6
| | | | | | | other entry points in the form mac_<object>_method(). Discussed with: csjp Obtained from: TrustedBSD Project
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Pick the smallest possible TCP window scaling factor that will still allowsilby2007-10-191-6/+17
| | | | | | | | | | | | | | | | us to scale up to sb_max, aka kern.ipc.maxsockbuf. We do this because there are broken firewalls that will corrupt the window scale option, leading to the other endpoint believing that our advertised window is unscaled. At scale factors larger than 5 the unscaled window will drop below 1500 bytes, leading to serious problems when traversing these broken firewalls. With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by this algorithm. Those who choose a larger maxsockbuf should watch out for the compatiblity problems mentioned above. Reviewed by: andre
* Add FBSDID to all files in netinet so that people can moresilby2007-10-071-2/+3
| | | | | | easily include file version information in bug reports. Approved by: re (kensmith)
* Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, whichrwatson2007-08-061-1/+0
| | | | | | | | | | | | | | | previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
* Fix a typo in a log message: s/Reveived/Received/.bmah2007-07-291-1/+1
| | | | Approved by: re (rwatson)
* Fix a panic introduced in rev 1.126.silby2007-07-281-1/+1
| | | | Approved by: re (rwatson)
* o Move setting/resetting logic of syncache timer from macroandre2007-07-281-19/+49
| | | | | | | | | | | | | | | | SYNCACHE_TIMEOUT to new function syncache_timeout(). o Fix inverted timeout callout engagement logic to actually enable the timer for the bucket row. Before SYN|ACK was not retransmitted. o Simplify SYN|ACK retransmit timeout backoff calculation. o Improve logging of retransmit and timeout events. o Reset timeout when duplicate SYN arrives. o Add comments. o Rearrange SYN cookie statistics counting. Bug found by: silby Submitted by: silby (different version) Approved by: re (rwatson)
* o Move all detailed checks for RST in LISTEN state from tcp_input() toandre2007-07-281-1/+41
| | | | | | | | | syncache_rst(). o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before a RST for a connection in syncache did not properly free the entry. o Add more detailed logging. Approved by: re (rwatson)
* Export the contents of the syncache to netstat.silby2007-07-271-0/+74
| | | | | Approved by: re (kensmith) MFC after: 2 weeks
* Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSECgnn2007-07-031-3/+3
| | | | | | | | option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing
* Commit IPv6 support for FAST_IPSEC to the tree.gnn2007-07-011-12/+0
| | | | | | | | | This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing
* Correctly print SEQ and IRS in the corresponding log message inandre2007-06-061-1/+1
| | | | syncache_expand().
* Make log messages more verbose and simpler to understand for non-experts.andre2007-05-281-12/+14
| | | | Update comments to be more conscious, verbose and fully reflect reality.
* Refactor and rewrite in parts the SYN handling code on listen socketsandre2007-05-281-0/+4
| | | | | | | | | | | | | | | | | | | | | in tcp_input(): o tighten the checks on allowed TCP flags to be RFC793 and tcp-secure conform o log check failures to syslog at LOG_DEBUG level o rearrange the code flow to be easier to follow o add KASSERTs to validate assumptions of the code flow Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable that controls the behavior on socket creation failure for a otherwise successful 3-way handshake. The socket creation can fail due to global memory shortage, listen queue limits and file descriptor limits. The sysctl allows to chose between two options to deal with this. One is to send a reset to the other endpoint to notify it about the failure (default). The other one is to ignore and treat the failure as a transient error and have the other endpoint retransmit for another try. Reviewed by: rwatson (in general)
* Be more restrictive with segment validity checks in syncache_expand()andre2007-05-181-3/+42
| | | | | | and log check failures to syslog at LOG_DEBUG level. Always prefill the sc->sc_ts field to use it in the checks.
* o Add syslog logging under LOG_DEBUG to various failures caused byandre2007-05-181-5/+38
| | | | | | bogus segments o Add more KASSERT()s o Update comments
* Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead ofandre2007-05-061-3/+1
| | | | a decdicated sack_enable int for this bool. Change all users accordingly.
* o Remove unused and redundant TCP option definitionsandre2007-04-201-1/+1
| | | | | o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN
* Remove bogus check for accept queue length and associated failure handlingandre2007-04-201-2/+2
| | | | | | | | | | | | | | from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.
* Simplifly syncache_expand() and clarify its semantics. Zero is returnedandre2007-04-201-17/+4
| | | | | | | | | | | | | | | when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)
* Only update TCP timestamp on SYN duplication if it is present onandre2007-04-201-1/+3
| | | | current SYN in syncache_add(). Otherwise disable timestamps.
* o Plug memory leak in syncache_add() on MAC label allocation failure.andre2007-04-201-18/+12
| | | | | | o Simplify code flow with 'done' goto label. o Remove mbuf argument from syncache_respond(). It doesn't make use of it.
* When we run into the syncache entry limits syncache_add() triesandre2007-04-171-2/+2
| | | | | | | | | to free the oldest entry in the current bucket row. The global entry limit may be smaller than the bucket rows and their limit combined however. Thus only try to free a syncache entry if we found one in this bucket row. Reported by: kris
* Change the TCP timer system from using the callout system five timesandre2007-04-111-1/+1
| | | | | | | | | | | | | | | | directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)
* Move last tcpcb initialization for the inbound connection case fromandre2007-04-041-0/+3
| | | | | | | | tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.
* Unbreak IPv6 after consolidation of TCP options insertion.andre2007-03-171-3/+2
| | | | Submitted by: tegge
* Fix the most obvious of the bugs introduced by recent syncache changeskmacy2007-03-171-0/+3
| | | | | | | | - *ip is not initialized in the case of inet6 connection, but ip->ip_len is being changed anyway Now the question is, why does it think an ipv4 connection is an ipv6 connection? xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
* Consolidate insertion of TCP options into a segment from within tcp_output()andre2007-03-151-75/+43
| | | | | | | | | | | | | | and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005
* Change the way the advertized TCP window scaling is computed. Instead ofandre2007-02-011-2/+8
| | | | | | | | | | | | | | | upper-bounding it to the size of the initial socket buffer lower-bound it to the smallest MSS we accept. Ideally we'd use the actual MSS information here but it is not available yet. For socket buffer auto sizing to be effective we need room to grow the receive window. The window scale shift is determined at connection setup and can't be changed afterwards. The previous, original, method effectively just did a power of two roundup of the socket buffer size at connection setup severely limiting the headroom for larger socket buffers. Tested by: many (as part of the socket buffer auto sizing patch) MFC after: 1 month
* Fix LOR between the syncache and inpcb locks when MAC is present in thecsjp2006-12-131-43/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed: - Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels. This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf. These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create. This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized. Submitted by: andre [1] Discussed with: rwatson [1] andre submitted the tcp_syncache.c changes
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Add missing #ifdef INET6 (can't be compiled)ache2006-09-141-0/+2
|
* Remove unessary includes and follow common ordering style.andre2006-09-131-10/+2
|
* Rewrite of TCP syncookies to remove locking requirements and to enhanceandre2006-09-131-191/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005
* In syncache_respond() do not reply with a MSS that is larger than whatandre2006-06-261-0/+2
| | | | | | the peer announced to us but make it at least tcp_minmss in size. Sponsored by: TCP/IP Optimization Fundraise 2005
* Some cleanups and janitorial work to tcp_syncache:andre2006-06-261-45/+33
| | | | | | | | | | | | | | | | | o don't assign remote/local host/port information manually between provided struct in_conninfo and struct syncache, bcopy() it instead o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture the purpose of this field o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons o fix IPSEC error case printf's to report correct function name o in syncache_socket() only transpose enhanced tcp options parameters to struct tcpcb when the inpcb doesn't has TF_NOOPT set o in syncache_respond() reorder stack variables o in syncache_respond() remove bogus KASSERT() No functional changes. Sponsored by: TCP/IP Optimization Fundraise 2005
* Reverse the source/destination parameters to in[6]_pcblookup_hash() inandre2006-06-261-2/+2
| | | | | | syncache_respond() for the #ifdef MAC case. Submitted by: Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
* Decrement the global syncache counter in syncache_expand() when the entryandre2006-06-251-0/+1
| | | | is removed from the bucket. This fixes the syncache statistics.
* Move the syncookie MD5 context from globals to the stack to make it MP safe.andre2006-06-221-2/+2
|
* Allocate a zero'ed syncache hashtable. mtx_init() tests the suppliedandre2006-06-201-1/+1
| | | | | | | | memory location for already existing/initialized mutexes. With random data in the memory location this fails (ie. after a soft reboot). Reported by: brueffer, YAMAMOTO Shigeru Submitted by: YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
* Do not access syncache entry before it was allocated for the TF_NOOPT caseandre2006-06-181-3/+4
| | | | | | | in syncache_add(). Found by: Coverity Prevent CID: 1473
* Move all syncache related structures to tcp_syncache.c. They are only usedandre2006-06-181-0/+35
| | | | | | | | there. This unbreaks userland programs that include tcp_var.h. Discussed with: rwatson
* Remove double lock acquisition in syncookie_lookup() which came from lastandre2006-06-181-1/+0
| | | | | | minute conversions to macros. Pointy hat to: andre
* Fix the !INET6 compile.andre2006-06-171-2/+4
| | | | Reported by: alc
* ANSIfy and tidy up comments.andre2006-06-171-52/+23
| | | | Sponsored by: TCP/IP Optimization Fundraise 2005
* Add locking to TCP syncache and drop the global tcpinfo lock as earlyandre2006-06-171-254/+285
| | | | | | | | | | | | | | | | | | as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity. On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment. Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005
* Change soabort() from returning int to returning void, since allrwatson2006-03-161-1/+1
| | | | | | consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.
* Rework TCP window scaling (RFC1323) to properly scale the send windowandre2006-02-281-1/+1
| | | | | | | | | | | | | right from the beginning and partly clean up the differences in handling between SYN_SENT and SYN_RCVD (syncache). Further changes to this code to come. This is a first incremental step to a general overhaul and streamlining of the TCP code. PR: kern/15095 PR: kern/92690 (partly) Reviewed by: qingli (and tested with ANVL) Sponsored by: TCP/IP Optimization Fundraise 2005
* Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry.qingli2006-02-091-5/+4
| | | | | Reviewed by: andre, glebius MFC after: 3 days
OpenPOWER on IntegriCloud