summaryrefslogtreecommitdiffstats
path: root/sys/net/flowtable.c
Commit message (Collapse)AuthorAgeFilesLines
* The llentry_update() is used only by flowtable and the latterglebius2012-08-021-2/+2
| | | | | always passes NULL pointer to it. Thus, code can be simplified and function renamed to llentry_alloc() to match rtalloc().
* When ip_output()/ip6_output() is supplied a struct route *ro argument,glebius2012-07-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation. The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro. - Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL. Tested by: tuexen (SCTP part)
* Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:bz2012-02-171-2/+2
| | | | | | | | | | | | Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
* A flowtable entry can continue referencing an llentry indefinitely if the ↵kmacy2012-01-261-1/+3
| | | | | | | | | | | entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+2
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* - Restore dropping the priority of syncer down to PPAUSE when it is idle.jhb2011-01-061-0/+5
| | | | | | | | | This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks
* Print the vnet pointer under DDB when iterating over flowtables of eachbz2010-12-311-0/+3
| | | | | | | | | | virtual network stack instance. Sponsored by: ISPsystem [1] Reviewed by: julian [1] MFC after: 1 week [1] Early 2010.
* Move the increment operation under the lock and split the conditionbz2010-12-311-8/+10
| | | | | | | | | | | | | | | variable into two so that we can see on which one we are waiting. This might also more properly propagate the update of the flowclean_cycles flag and avoid "hangs" people were seeing. Suggested by: rwatson [1] Sponsored by: ISPsystem [1] Reviewed by: julian [1] Updated by: Mikolaj Golub (to.my.trociny gmail.com) Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC After: 1 week [1] Early 2010, initial version.
* After some off-list discussion, revert a number of changes to thedim2010-11-221-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
* Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughoutdim2010-11-141-11/+11
| | | | the tree.
* Update several places that iterate over CPUs to use CPU_FOREACH().jhb2010-06-111-14/+4
|
* allocate ipv6 flows from the ipv6 flow zonekmacy2010-05-161-1/+0
| | | | | | reported by: rrs@ MFC after: 3 days
* workaround bug with ipv6 where a flow can have a null rtentrykmacy2010-05-121-2/+4
|
* need to initialize the lock before it is usedkmacy2010-04-271-1/+1
| | | | MFC after: 3 days
* - boot-time size the ipv4 flowtable and the maximum number of flowskmacy2010-03-221-24/+82
| | | | | | | | | - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5% MFC after: 7 days
* flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDBkmacy2010-03-121-14/+13
| | | | pointed out by jkim@
* re-update copyright to 2010kmacy2010-03-121-1/+1
| | | | pointed out by danfe@
* The flow-table module retrieves the destination and sourceqingli2010-03-121-0/+13
| | | | | | | | | | | | | | address as well as the transport protocol port information from the outbound packets. The routing code is generic and compares every byte in the given sockaddr object. Therefore the temporary sockaddr objects must be cleared due to padding bytes. In addition, the port information must be stripped or the route search will either fail or return the incorrect route entry. Unit testing is done using OpenVPN over the if_tun interface. MFC after: 7 days
* fix stats reporting sysctlkmacy2010-03-121-17/+17
|
* - restructure flowtable to support ipv6kmacy2010-03-121-155/+715
| | | | | | | | | | | | | | - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate MFC after: 7 days
* One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is toqingli2010-03-091-1/+2
| | | | | | | | | | | | | | | | | | | | | allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible. The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole. Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed. MFC after: 5 days
* Remove extraneous semicolons, no functional changes.mbr2010-01-071-1/+1
| | | | | Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
* Verify "smp_started" is true before callingqingli2009-10-221-6/+10
| | | | | | | sched_bind() and sched_unbind(). Reviewed by: kmacy MFC after: 3 days
* The flow-table function flowtable_route_flush() may be calledqingli2009-10-201-7/+11
| | | | | | | | | | during system initialization time. Since the flow-table is designed to maintain per CPU flow cache, the existing code did not check whether "smp_started" is true before calling sched_bind() and sched_unbind(), which triggers a page fault. Reviewed by: jeff MFC after: immediately
* The flow-table associates TCP/UDP flows and IP destinations withqingli2009-10-011-5/+35
| | | | | | | | | | | | | | | | | specific routes. When the routing table changes, for example, when a new route with a more specific prefix is inserted into the routing table, the flow-table is not updated to reflect that change. As such existing connections cannot take advantage of the new path. In some cases the path is broken. This patch will update the affected flow-table entries when a more specific route is added. The route entry is properly marked when a route is deleted from the table. In this case, when the flow-table performs a search, the stale entry is updated automatically. Therefore this patch is not necessary for route deletion. Submitted by: simon, phk Reviewed by: bz, kmacy MFC after: 3 days
* In ip_output(), the flow-table module must not try to cache L2/L3qingli2009-08-281-0/+6
| | | | | | | | | | | | | | | | | | | | information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type. Since the L2 information (rt_lle) is invalid for these interface types, accidental caching attempt will trigger panic when the invalid rt_lle reference is accessed. When installing a new route, or when updating an existing route, the user supplied gateway address may be an interface address (this is particularly true for point-to-point interface related modules such as ppp, if_tun, if_gif). Currently the routing command handler always set the RTF_GATEWAY flag if the gateway address is given as part of the command paramters. Therefore the gateway address must be verified against interface addresses or else the route would be treated as an indirect route, thus making that route unusable. Reviewed by: kmacy, julia, rwatson Verified by: marcus MFC after: 3 days
* Don't allow access to the internals until it has all been set up.julian2009-08-211-1/+2
| | | | | | | | | Specifically, not until the per-vnet parts have been set up. Submitted by: kmacy@ Reviewed by: julian@, zec@ Approved by: re(rwatson) MFC after: immediately
* This change fixes a comment and addresses a complaint by kib@ bykmacy2009-08-191-2/+6
| | | | | | | | moving a frequently executed flowtable syslog statement from being conditional on bootverbose to conditional on a per-vnet flowtable sysctl. Approved by: re@
* - change the interface to flowtable_lookup so that we don't rely onkmacy2009-08-181-41/+194
| | | | | | | | | | | | | | | | | the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@) Reviewed by: sam@ Approved by: re@
* fix netboot issue by disabling flowtable lookups until initialization has ↵kmacy2009-08-171-1/+4
| | | | | | | been run Reviewed by: rwatson@ Approved by: re@
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+0
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Introduce and use a sysinit-based initialization scheme for virtualrwatson2009-07-231-30/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | network stacks, VNET_SYSINIT: - Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events. Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)
* Garbage collect vnet module registrations that have neither constructorsrwatson2009-07-201-1/+0
| | | | | | | | | | | | | | | nor destructors, as there's no actual work to do. In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added. Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6. Reviewed by: bz Approved by: re (vimage blanket)
* Remove unused VNET_SET() and related macros; only VNET_GET() isrwatson2009-07-161-17/+17
| | | | | | | | | ever actually used. Rename VNET_GET() to VNET() to shorten variable references. Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-78/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* V_irtualize flowtable state.zec2009-06-221-100/+190
| | | | | | | | | | | | This change should make options VIMAGE kernel builds usable again, to some extent at least. Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)
* revert to opt-in flowtablekmacy2009-06-091-3/+2
|
* make flowtable opt-outkmacy2009-06-091-1/+2
|
* move jenkins hash to its own header in libkernkmacy2009-06-091-145/+2
|
* Remove one INET dependency by calling the generalbz2009-06-091-1/+1
| | | | | | AF agnostic version for doing the routing lookup. Reviewed by: kmacy
* remove gratuitous memory barrier, a remnant of unified L2 / L3kmacy2009-04-271-1/+0
|
* simplify code by removing bit_fns and replacing with the use of a temporary maskkmacy2009-04-201-56/+20
|
* update TODO listkmacy2009-04-191-1/+4
|
* - put larger flowtable members at the endkmacy2009-04-191-13/+17
| | | | | | - fix bug where tail pointer of the free list would not get advanced - clear entry's next pointer when it is added to the freelist to avoid freeing an entry that it still points to
* - Import infrastructure for caching flows as a means of accelerating L3 and ↵kmacy2009-04-191-0/+1105
L2 lookups as well as providing stateful load balancing when used with RADIX_MPATH. - Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at runtime with 'sysctl net.inet.flowtable.enable=1'. - Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to their kernel config files. - A minimal hookup will be added to ip_output in a subsequent commit. I would like to see more review before bringing in changes that require more churn. Supported by: Bitgravity Inc.
OpenPOWER on IntegriCloud