summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* The proxy-arp code was broken and responds to ARPqingli2008-12-191-57/+52
| | | | requests for addresses that are not proxied locally.
* Another step assimilating IPv[46] PCB code:bz2008-12-171-1/+1
| | | | | | | | | normalize IN6P_* compat flags usage to their equialent INP_* counterpart. Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
* Use inc_flags instead of the inc_isipv6 alias which so farbz2008-12-178-33/+35
| | | | | | | | | | | | | | had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition. While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1] Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
* default to doing lla_lookup with shared afdata lock and returning akmacy2008-12-171-9/+10
| | | | | shared lock on the lle - thus restoring parallel performance to pre-arpv2 level
* IPFW's pfil hook/unhook code ignores the return values of pfil_add_hook()rwatson2008-12-161-8/+16
| | | | | | and pfil_remove_hook(), so cast them to (void). MFC after: pretty soon
* ipfw doesn't use the radix node head lock to protect the radix tree - remove ↵kmacy2008-12-161-2/+0
| | | | acquisition
* check pointer against NULLkmacy2008-12-161-2/+3
| | | | add new line after declaration for style
* don't unlock lle if it is NULLkmacy2008-12-161-1/+2
|
* unlock and destroy an llentry's lock before freeingkmacy2008-12-161-0/+2
| | | | Found by: sam
* Another step assimilating IPv[46] PCB code - directly usebz2008-12-153-8/+8
| | | | | | | | | | | | | | the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb(). Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change. Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)
* This main goals of this project are:qingli2008-12-1515-571/+491
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
* Add a check, that is currently under discussion for 8 but that we needbz2008-12-141-0/+4
| | | | | | | | | | | | | | | | | | to keep for 7-STABLE when MFCing in_pcbladdr() to not change the behaviour there. With this a destination route via a loopback interface is treated as a valid and reachable thing for IPv4 source address selection, even though nothing of that network is ever directly reachable, but it is more like a blackhole route. With this the source address will be selected and IPsec can grab the packets before we would discard them at a later point, encapsulate them and send them out from a different tunnel endpoint IP. Discussed on: net Reported by: Frank Behrens <frank@harz.behrens.de> Tested by: Frank Behrens <frank@harz.behrens.de> MFC after: 4 weeks (just so that I get the mail)
* De-virtualize the MD5 context for TCP initial seq number generationbz2008-12-132-12/+10
| | | | | | | | and make it a function local variable like we do almost everywhere inside the kernel. Discussed with: rwatson, silby MFC after: 4 weeks
* version that will compilekmacy2008-12-131-2/+3
|
* radix node head lock needs to be held when calling rnh_addaddrkmacy2008-12-131-0/+2
|
* don't acquire lock recursivelykmacy2008-12-131-1/+1
|
* Second round of putting global variables, which were virtualizedbz2008-12-136-7/+23
| | | | | | | | | | | but formerly missed under VIMAGE_GLOBAL. Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. Sponsored by: The FreeBSD Foundation
* Put a global variables, which were virtualized but formerlybz2008-12-117-7/+20
| | | | | | | | | | | | | missed under VIMAGE_GLOBAL. Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely. While there garbage collect a few dead externs from ip6_var.h. Sponsored by: The FreeBSD Foundation
* Use the correct INIT_VNET_INET() as the virtualized variable herebz2008-12-111-1/+1
| | | | | | are in vinet.h not in vinet6.h Sponsored by: The FreeBSD Foundation
* Conditionally compile out V_ globals while instantiating the appropriatezec2008-12-1018-53/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | container structures, depending on VIMAGE_GLOBALS compile time option. Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0. Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively #ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs. Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c. Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS. De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import. Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Remove inconsistent white space from in_pcballoc().rwatson2008-12-101-2/+0
| | | | MFC after: pretty soon
* Move syncache flag definitions below data structure, compress some verticalrwatson2008-12-101-10/+12
| | | | | | whitespace. MFC after: pretty soon
* Move flag definitions for t_flags and t_oobflags below the definition ofrwatson2008-12-101-28/+36
| | | | | | | struct tcpcb so that the structure definition is a bit more vertically compact. Can't yet fit it on one printed page, though. MFC after: pretty soon
* unlock when donekmacy2008-12-101-1/+1
|
* don't reference if_addr_mtx directlykmacy2008-12-101-2/+2
|
* Update comment on INP_TIMEWAIT to say what it's about, as we cautionrwatson2008-12-091-1/+1
| | | | | | regarding the misplacement of flags in inp_vflag in an earlier comment. MFC after: pretty soon
* Enhance one comment relating to recent TCP locking changes, and fix arwatson2008-12-091-6/+6
| | | | | | typo in another. MFC after: 6 weeks
* Move macros defining flags and shortcus to nested structure fields inrwatson2008-12-091-26/+34
| | | | | | | inpcbinfo below the structure definition in order to make inpcbinfo fit on a single printed page; related style tweaks. MFC after: pretty soon
* Move from solely write-locking the global tcbinfo in tcp_input()rwatson2008-12-081-59/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to read-locking in the TCP input path, allowing greater TCP input parallelism where multiple ithreads or ithread and netisr are able to run in parallel. Previously, most TCP input paths held a write lock on the global tcbinfo lock, effectively serializing TCP input. Before looking up the connection, acquire a write lock if a potentially state-changing flag is set on the TCP segment header (FIN, RST, SYN), and otherwise a read lock. We may later have to upgrade to a write lock in certain cases (ACKs received by the syncache or during TIMEWAIT) in order to support global state transitions, but this is never required for steady-state packets. Upgrading from a write lock to a read lock must be done as a trylock operation to avoid deadlocks, and actually violates the lock order as the tcbinfo lock preceeds the inpcb lock held at the time of upgrade. If the trylock fails, we bump the refcount on the inpcb, drop both locks, and re-acquire in-order. If another thread has freed the connection while the locks are dropped, we free the inpcb and repeat the lookup (this should hardly ever or never happen in practice). For now, maintain a number of new counters measuring how many times various cases execute, and in particular whether various optimistic assumptions about when read locks can be used, whether upgrades are done using the fast path, and whether connections close in practice in the above-described race, actually occur. MFC after: 6 weeks Discussed with: kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy
* Add a reference count to struct inpcb, which may be explicitlyrwatson2008-12-082-12/+85
| | | | | | | | | | | | | | | | | | | | | | | incremented using in_pcbref(), and decremented using in_pcbfree() or inpcbrele(). Protocols using only current in_pcballoc() and in_pcbfree() calls will see the same semantics, but it is now possible for TCP to call in_pcbref() and in_pcbrele() to prevent an inpcb from being freed when both tcbinfo and per-inpcb locks are released. This makes it possible to safely transition from holding only the inpcb lock to both tcbinfo and inpcb lock without re-looking up a connection in the input path, timer path, etc. Notice that in_pcbrele() does not unlock the connection after decrementing the refcount, if the connection remains, so that the caller can continue to use it; in_pcbrele() returns a flag indicating whether or not the inpcb pointer is still valid, and in_pcbfee() is now a simple wrapper around in_pcbrele(). MFC after: 1 month Discussed with: bz, kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy
* in_rtalloc1(9) returns a locked route, so make sure that we usecsjp2008-12-061-4/+4
| | | | | | | | | | | | RTFREE_LOCKED() here. This macro makes sure the reference count on the route is being managed properly. This elimates another case which results in the following message being printed to the console: rtfree: 0xc841ee88 has 1 refs Reviewed by: bz MFC after: 2 weeks
* Code from the hack-session known as the IETF (and arrs2008-12-0624-885/+6914
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet. With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks
* In a case of CARP status change run through the if_link_state_change()glebius2008-12-051-4/+5
| | | | routine, so that devd(8) and others are notified about link state change.
* Rather than using hidden includes (with cicular dependencies),bz2008-12-0235-9/+53
| | | | | | | | | | | directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
* MFp4:bz2008-11-296-125/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
* Add an essential .h file that skipped from the last commit (r185419).zec2008-11-281-0/+82
| | | | | | Pointy hat #1 on... Pointed out by: bz
* Unhide declarations of network stack virtualization structs fromzec2008-11-286-61/+15
| | | | | | | | | | | | | | | | | | underneath #ifdef VIMAGE blocks. This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues. In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* missing V_des2008-11-281-1/+1
|
* Replace most INP_CHECK_SOCKAF() uses checking if it is anbz2008-11-273-8/+4
| | | | | | | | | IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking. Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks
* Merge in6_pcbfree() into in_pcbfree() which after the previousbz2008-11-273-33/+15
| | | | | | | | | | IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place. Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
* Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy.bz2008-11-272-2/+2
| | | | | | | | | Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical. Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrappers in 7 to keep the symbols.
* Merge more of currently non-functional (i.e. resolving tozec2008-11-2612-28/+49
| | | | | | | | | | | | | | | | | whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Remove in6_pcbdetach() as it is exactly the same functionbz2008-11-261-32/+10
| | | | | | | | as in_pcbdetach() and we don't need the code twice. Reviewed by: rwatson MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
* Unify the v4 and v6 versions of pcbdetach and pcbfree as goodbz2008-11-261-3/+3
| | | | | | | | | as possible so that they are easily diffable. No functional changes. Reviewed by: rwatson MFC after: 6 weeks
* Fix a scope problem in the multiple routing table code that stopped thejulian2008-11-193-2/+16
| | | | | | | SO_SETFIB socket option from working correctly. Obtained from: Ironport MFC after: 3 days
* Change the initialization methodology for global variables scheduledzec2008-11-1928-117/+315
| | | | | | | | | | | | | | | | | | | | | | | | for virtualization. Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks. Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures. Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* -Improvement: Add '\n' on debug output in sctp_lower_sosend().rrs2008-11-126-24/+22
| | | | | | | | | | | | | | | | | -Improvement: panic() on INVARIANTS kernels if memory allocation fails for a tagblock in sctp_add_vtag_to_timewait(). -Bugfix: Protect code in sctp_is_in_timewait() by SCTP_INP_INFO_WLOCK/SCTP_INP_INFO_WUNLOCK. -Cleanup: Get rid of unused variable now in sctp_init_asoc(). -Bugfix: Reuse the correct vtag in sctp_add_vtag_to_timewait(). -Cleanup: Get rid of unused constant SCTP_TIME_WAIT_SHORT in sctp_constants.h. -Improvement: Use all hash buckets of the vtag hash table. -Cleanup: Get rid of then unused constant SCTP_STACK_VTAG_HASH_SIZE_A. -Bugfix: Handle SHUTDOWN;SACK packet correctly. -Bugfix: Last TSN in a gap ack block was not being "ack'd" in the internal scoreboard. Obtained from: (with help from Michael Tuexen)
* For consistency work on the local object passed into the function for thebz2008-11-091-3/+3
| | | | | | | lock operation instead using the global name. Submitted by: ganbold MFC after: 2 months
* Fix typo and while here another one.bz2008-11-061-2/+2
| | | | | | Reviewed by: keramida Reported by: keramida MFC after: 2 months (with r184720)
* Fix a bug introduced with r182851 splitting tcp_mss() intobz2008-11-063-11/+12
| | | | | | | | | | | | | | | | | | tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code. Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack. Reported by: kmacy MFC after: 2 months (along with r182851)
OpenPOWER on IntegriCloud