summaryrefslogtreecommitdiffstats
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge SVN r295220 (bz) from projects/vnet/dteske2016-02-111-0/+14
| | | | | | | Fix a panic that occurs when a vnet interface is unavailable at the time the vnet jail referencing said interface is stopped. Sponsored by: FIS Global, Inc.
* These files were getting sys/malloc.h and vm/uma.h with header pollutionglebius2016-02-015-0/+6
| | | | via sys/mbuf.h
* Provide TCPSTAT_DEC() and TCPSTAT_FETCH() macros.glebius2016-01-271-0/+3
|
* Prune a definition which is / was never used.zec2016-01-251-1/+0
|
* Fix flowtable part missed in r294706.melifaro2016-01-251-1/+1
|
* MFP r287070,r287073: split radix implementation and route table structure.melifaro2016-01-258-200/+332
| | | | | | | | | | | | | | | | | | | | | | | There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.
* Remove unused radix_mpath definitions.melifaro2016-01-251-3/+0
|
* Add an IOCTL rr_limit to let users fine tuning the number of packets to bearaujo2016-01-232-2/+26
| | | | | | | | | | | | | | | | | | | sent using roundrobin protocol and set a better granularity and distribution among the interfaces. Tuning the number of packages sent by interface can increase throughput and reduce unordered packets as well as reduce SACK. Example of usage: # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \ 192.168.1.1 netmask 255.255.255.0 # ifconfig lagg0 rr_limit 500 Reviewed by: thompsa, glebius, adrian (old patch) Approved by: bapt (mentor) Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D540
* Clean up original route path selection logic a bit.melifaro2016-01-151-5/+6
| | | | | | | NULL pointer dereference claimed by Coverity was possible if one (or several) next-hops for had their weights set to 0. CID: 1348482
* Fix panic in IP redirect. Panic was introduced in r293466.melifaro2016-01-141-2/+2
| | | | Found by: Yamagi Burmeister <lists at yamagi.org>>
* Remove now-unused wrappers for various routing functions.melifaro2016-01-142-72/+0
|
* Remove RTF_RNH_LOCKED support from rtalloc1_fib().melifaro2016-01-132-17/+6
| | | | | | Last caller using it was eliminated in r293471. Sponsored by: Yandex LLC
* Bring RADIX_MPATH support to new routing KPI to ease migration.melifaro2016-01-112-21/+41
| | | | | | Move actual rte selection process from rtalloc_mpath_fib() to the rt_path_selectrte() function. Add public rt_mpath_select() to use in fibX_lookup_ functions.
* Do not rewrite all ro_flags.melifaro2016-01-111-1/+1
|
* Fix userland build broken by r293470.melifaro2016-01-091-0/+2
| | | | Pointy hat to: melifaro
* Finish r275196: do not dereference rtentry in if_output() routines.melifaro2016-01-097-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | The only piece of information that is required is rt_flags subset. In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result. All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error. There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()). Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte. Reviewed by: ae
* Remove sys/eventhandler.h from net/route.hmelifaro2016-01-091-1/+0
| | | | Reviewed by: ae
* (Temporarily) remove route_redirect_event eventhandler.melifaro2016-01-092-15/+2
| | | | | | | | | | Such handler should pass different set of variables, instead of directly providing 2 locked route entries. Given that it hasn't been really used since at least 2012, remove current code. Will re-add it after finishing most major routing-related changes. Discussed with: np
* Please Coverity by removing unneccessary check (rt_key() is always set).melifaro2016-01-091-1/+1
| | | | Coverity CID: 1347797
* Do more fine-grained locking in rtrequest1_fib().melifaro2016-01-081-30/+22
| | | | | | Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643. Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better locking for RTM_ADD / RTM_DELETE cases.
* Add rib_lookup_info() to provide API for retrieving individual routemelifaro2016-01-043-11/+165
| | | | | | | | | | | | | | | | | | | | | | | entries data in unified format. There are control plane functions that require information other than just next-hop data (e.g. individual rtentry fields like flags or prefix/mask). Given that the goal is to avoid rte reference/refcounting, re-use rt_addrinfo structure to store most rte fields. If caller wants to retrieve key/mask or gateway (which are sockaddrs and are allocated separately), it needs to provide sufficient-sized sockaddrs structures w/ ther pointers saved in passed rt_addrinfo. Convert: * lltable new records checks (in_lltable_rtcheck(), nd6_is_new_addr_neighbor(). * rtsock pre-add/change route check. * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because 1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should not be multiple host routes for such hosts 2) if we have multiple routes we should inspect them (which is not done). 3) the entire idea of abusing KRT as storage for ND proxy seems odd. Userland programs should be used for that purpose).
* Handle IPV6_PATHMTU option by spliting ip6_getpmtu_ctl() from ip6_getpmtu().melifaro2016-01-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | Add ro_mtu field to 'struct route' to be able to pass lookup MTU back to the caller. Currently, ip6_getpmtu() has 2 totally different use cases: 1) control plane (IPV6_PATHMTU req), where we just need to calculate MTU and return it, w/o any reusability. 2) Actual ip6_output() data path where we (nearly) always use the provided route lookup data. If this data is not 'valid' we need to perform another lookup and save the result (which cannot be re-used by ip6_output()). Given that, handle 1) by calling separate function doing rte lookup itself. Resulting MTU is calculated by (newly-added) ip6_calcmtu() used by both ip6_getpmtu_ctl() and ip6_getpmtu(). For 2) instead of storing ref'ed rte, store mtu (the only needed data from the lookup result) inside newly-added ro_mtu field. 'struct route' was shrinked by 8(or 4 bytes) in r292978. Grow it again by 4 bytes. New ro_mtu field will be used in other places like ip/tcp_output (EMSGSIZE handling from output routines). Reviewed by: ae
* Remove second EVENTHANDLER_REGISTER slipped in r292978.melifaro2016-01-011-0/+4
| | | | Describe the reason of doing unconditional M_PREPEND in ether_output().
* Clean up unused-but-set-variable spotted by gcc4.9.araujo2015-12-311-2/+0
| | | | | | Reviewed by: ngie Approved by: rodrigc (mentor) Differential Revision: https://reviews.freebsd.org/D4719
* Implement interface link header precomputation API.melifaro2015-12-318-141/+402
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..). Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf. These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers. ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data. BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65). Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle. Differential Revision: https://reviews.freebsd.org/D4102
* Wrap using #ifdef 'notyet' those variables and statements not yetaraujo2015-12-311-4/+18
| | | | | | | | | | implemented to lower the compiler warnings. It fix the case of unused-but-set-variable spotted by gcc4.9. Reviewed by: ngie, ae Approved by: bapt (mentor) Differential Revision: https://reviews.freebsd.org/D4720
* Add SFF-8024 Extended Specification Compliancemelifaro2015-12-281-1/+1
| | | | | | Submitted by: markb_mellanox.com MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D4666
* If vnets are torn down while ifconfig runs an ioctl to say, destroy anbz2015-12-221-3/+17
| | | | | | | | | | | | | | | epair(4), we may hit if_detach_internal() without holding a lock and by the time we aquire it the interface might be gone. We should not panic() in this case as it is our fault for not holding the lock all the way. It is not ideal to return silently without error to user space, but other callers will all ignore the return values so do not change the entire KPI for little benefit for now. The ifp will be dealt with one way or another still. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4529
* If bootverbose is enabled every vnet startup and virtual interfacebz2015-12-221-1/+1
| | | | | | | | | | | | | creation will print extra lines on the console. We are generally not interested in this (repeated) information for each VNET. Thus only print it for the default VNET. Virtual interfaces on the base system will remain printing information, but e.g. each loopback in each vnet will no longer cause a "bpf attached" line. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4531
* Simplify bringup order by removing a SYSINIT making it a static listbz2015-12-221-12/+2
| | | | | | | | | | | | | | | initialization. Mfp4 @180384,180385: There is no need for a dedicated SYSINIT here. The list can be initialized statically. Sponsored by: CK Software GmbH Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4528
* Revert r292275 & r292379smh2015-12-174-30/+10
| | | | | | | glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay
* Provide additional lle data in IPv6 lltable dump used by ndp(8).melifaro2015-12-161-0/+3
| | | | | | | | | | | | | | | Before the change, things like lle state were queried via SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump. This ioctl was added in 1999, probably to avoid touching rtsock code. This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the following way: expire (already) maps to rtm_rmx.rmx_expire isrouter -> rtm_flags & RTF_GATEWAY asked -> rtm_rmx.rmx_pksent state -> rtm_rmx.rmx_state (maps to rmx_weight via define) Reviewed by: ae
* Convert if_stf(4) to new routing api.melifaro2015-12-161-20/+8
|
* Fix lagg failover due to missing notificationssmh2015-12-154-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
* Fix PINNED routes handling.melifaro2015-12-131-0/+3
| | | | | | | | | | | | | | | | | Before r291643, adding new interface prefix had the following logic: try_add: EEXIST && (PINNED) { try_del(w/o PINNED flag) if (OK) try_add(PINNED) } In r291643, deletion was performed w/ PINNED flag held which leaded to new interface prefixes (like ::1) overriding older ones. Fix this by requesting deletion w/o RTF_PINNED. PR: kern/205285 Submitted by: Fabian Keil <fk at fabiankeil.de>
* Remove LLE read lock from IPv6 fast path.melifaro2015-12-132-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLE structure is mostly unchanged during its lifecycle: there are only 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we send NS to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: Special r_skip_req (introduced in D3688) value is used for fast path feedback. It is read lockless by fast path, but updated under req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. After transitioning to STALE state, callout timer is armed to run each V_nd6_delay seconds to make sure that if packet was transmitted at the start of given interval, we would be able to switch to PROBE state in V_nd6_delay seconds as user expects. (in STALE state) timer is rescheduled until original V_nd6_gctimer expires keeping lle in STALE state (remaining timer value stored in lle_remtime). (in STALE state) timer is rescheduled if packet was transmitted less that V_nd6_delay seconds ago to make sure we transition to PROBE state exactly after V_n6_delay seconds. As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled by fast path without acquiring lle read lock. Differential Revision: https://reviews.freebsd.org/D3780
* Merge helper fib* functions used for basic lookups.melifaro2015-12-081-0/+31
| | | | | | | | | | | | | | | | | | | | Vast majority of rtalloc(9) users require only basic info from route table (e.g. "does the rtentry interface match with the interface I have?". "what is the MTU?", "Give me the IPv4 source address to use", etc..). Instead of hand-rolling lookups, checking if rtentry is up, valid, dealing with IPv6 mtu, finding "address" ifp (almost never done right), provide easy-to-use API hiding all the complexity and returning the needed info into small on-stack structure. This change also helps hiding route subsystem internals (locking, direct rtentry accesses). Additionaly, using this API improves lookup performance since rtentry is not locked. (This is safe, since all the rtentry changes happens under both radix WLOCK and rtentry WLOCK). Sponsored by: Yandex LLC
* Remove LLE read lock from IPv4 fast path.melifaro2015-12-052-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLE structure is mostly unchanged during its lifecycle. To be more specific, there are 2 things relevant for fast path lookup code: 1) link-level address change. Since r286722, these updates are performed under AFDATA WLOCK. 2) Some sort of feedback indicating that this particular entry is used so we re-send arp request to perform reachability verification instead of expiring entry. The only signal that is needed from fast path is something like binary yes/no. The latter is solved by the following changes: 1) introduce special r_skip_req field which is read lockless by fast path, but updated under (new) req_mutex mutex. If this field is non-zero, then fast path will acquire lock and set it back to 0. 2) introduce simple state machine: incomplete->reachable<->verify->deleted. Before that we implicitely had incomplete->reachable->deleted state machine, with V_arpt_keep between "reachable" and "deleted". Verification was performed in runtime 5 seconds before V_arpt_keep expire. This is changed to "change state to verify 5 seconds before V_arpt_keep, set r_skip_req to non-zero value and check it every second". If the value is zero - then send arp verification probe. These changes do not introduce any signifficant control plane overhead: typically lle callout timer would fire 1 time more each V_arpt_keep (1200s) for used lles and up to arp_maxtries (5) for dead lles. As a result, all packets towards "reachable" lle are handled by fast path without acquiring lle read lock. Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler might keep LLE lock for signifficant amount of time, which might not be feasible for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock). Differential Revision: https://reviews.freebsd.org/D3688
* Move RTF_PINNED handling to generic route code.melifaro2015-12-021-29/+27
| | | | This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.
* Fix LINT-NOIP kernels after r291467ngie2015-12-011-0/+2
| | | | | | rn is only used if INET or INET6 are defined Sponsored by: EMC / Isilon Storage Division
* Move flowtable rte checks to separate function.melifaro2015-11-301-50/+63
|
* Add new rt_foreach_fib_walk_del() function for deleting route entriesmelifaro2015-11-302-205/+259
| | | | | | | | | | | | | | | | | by filter function instead of picking into routing table details in each consumer. Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED user). This simplifies future nexthops/mulitipath changes and rtrequest1_fib() locking refactoring. Actual changes: Add "rt_chain" field to permit rte grouping while doing batched delete from routing table (thus growing rte 200->208 on amd64). Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo to pass filter function to various routing subsystems in standard way. Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate rt_expunge().
* Fix building sys/modules/if_enc by adding missing headersngie2015-11-251-0/+2
| | | | | X-MFC with: r291292, r291299 (if that ever happens) Pointyhat to: ae
* Fix the build.ae2015-11-251-1/+0
|
* Overhaul if_enc(4) and make it loadable in run-time.ae2015-11-254-197/+249
| | | | | | | | Use hhook(9) framework to achieve ability of loading and unloading if_enc(4) kernel module. INET and INET6 code on initialization registers two helper hooks points in the kernel. if_enc(4) module uses these helper hook points and registers its hooks. IPSEC code uses these hhook points to call helper hooks implemented in if_enc(4).
* Implement the sadb_x_policy_priority field as it is done in Linux:fabient2015-11-171-1/+1
| | | | | | | | lower priority policies are inserted first. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield
* Pass provided af instead of AF_UNSPEC to setwa_f callback.melifaro2015-11-141-1/+1
|
* Move iflladdr_event eventhandler invocation to if_setlladdr.melifaro2015-11-143-13/+4
| | | | Suggested by: glebius
* This fixes several places where callout_stops return is examined. Therrs2015-11-131-1/+1
| | | | | | | | | | new return codes of -1 were mistakenly being considered "true". Callout_stop now returns -1 to indicate the callout had either already completed or was not running and 0 to indicate it could not be stopped. Also update the manual page to make it more consistent no non-zero in the callout_stop or callout_reset descriptions. MFC after: 1 Month with associated callout change.
* Use lladdr_event to propagate gratiotus arp.melifaro2015-11-092-13/+7
| | | | Differential Revision: https://reviews.freebsd.org/D4019
OpenPOWER on IntegriCloud