summaryrefslogtreecommitdiffstats
path: root/sys/netinet/ip_carp.c
Commit message (Collapse)AuthorAgeFilesLines
* Importing pfSense patch redmine_4607.diffRenato Botelho2015-08-171-34/+36
|
* Importing pfSense patch if_pfsync.diffRenato Botelho2015-08-171-2/+4
|
* Importing pfSense patch carp_replay_protection.diffRenato Botelho2015-08-171-14/+17
|
* MFC r276751:loos2015-02-021-6/+4
| | | | | | | | | | | | | | | Remove the check that prevent carp(4) advskew to be set to '0'. CARP devices are created with advskew set to '0' and once you set it to any other value in the valid range (0..254) you can't set it back to zero. The code in question is also used to prevent that zeroed values overwrite the CARP defaults when a new CARP device is created. Since advskew already defaults to '0' for newly created devices and the new value is guaranteed to be within the valid range, it is safe to overwrite it here. PR: 194672 Reported by: cmb@pfsense.org
* Merge r262341:glebius2014-04-041-38/+33
| | | | | - Improve logging of send errors, reporting error code and interface. - Reduce code duplication between INET and INET6.
* Relese the interface in the last.trociny2013-09-081-1/+1
| | | | | Reviewed by: glebius Approved by: re (kib)
* Virtualize carp(4) variables to have per vnet control.trociny2013-08-131-53/+61
| | | | Reviewed by: ae, glebius
* Migrate struct carpstats to PCPU counters.ae2013-07-091-3/+24
|
* Add const qualifier to the dst parameter of the ifnet if_output method.glebius2013-04-261-1/+1
|
* Use m_get/m_gethdr instead of compat macros.glebius2013-03-151-2/+2
| | | | Sponsored by: Nginx, Inc.
* Resolve source address selection in presense of CARP. Add a coupleglebius2013-02-111-0/+10
| | | | | | | | | | | | | | | | | | of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc
* Garbage collect carp_cksum().glebius2012-12-251-10/+4
|
* Change net.inet.carp.demotion sysctl to add the supplied valueglebius2012-12-251-3/+20
| | | | | | | to the current demotion factor instead of assigning it. This allows external scripts to control demotion factor together with kernel in a raceless manner.
* Switch the entire IPv4 stack to keep the IP packet headerglebius2012-10-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
* Revert previous commit...kevlo2012-10-101-1/+1
| | | | Pointyhat to: kevlo (myself)
* Prefer NULL over 0 for pointerskevlo2012-10-091-1/+1
|
* carp_send_ad() should never return without rescheduling next run.glebius2012-09-291-11/+6
|
* Fix a problem when CARP is enabled on the interface for IPv4bz2012-07-251-8/+16
| | | | | | | | | | | | | | | | but not for IPv6. The current checks in nd6_nbr.c along with the old version will result in ifa being NULL and subsequently the packet will be dropped. This prevented NS/NA, from working and with that IPv6. Now return the ifa from the carp lookup function in two cases: 1) if the address matches, is a carp address, and we are MASTER (as before), 2) if the address matches but it is not a carp address at all (new). Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood) Tested on: New Y! FreeBSD cluster machines Reviewed by: glebius
* Improve style(9) of bcopy() to and from mbuf tag.glebius2012-05-301-4/+3
| | | | Submitted by: bde
* After r228571 carp_output() expects carp_softc * pointer in the mtag.glebius2012-05-301-3/+3
| | | | Noticed by: thompsa
* It is a logical error that in carp_multicast_cleanup()glebius2012-04-111-24/+59
| | | | | | | | we look at count of addresses on a particular vhid, we should account number of addresses on cif. To achieve this we need to run carp_attach() and carp_detach() under appropriate cif lock.
* CARP should be capable to run on if_bridge(4). Unfortunately,glebius2012-04-101-0/+2
| | | | | | | | this commit is not enough to enable CARP operation on if_bridge(4), because the latter doesn't handle or even initialize its ifp->if_link_state. Reported by: Alexander Lunev <sol289 gmail.com>
* Set vnet context in callouts and taskqueues.glebius2012-02-081-0/+8
| | | | PR: 164696
* o Provide functions carp_ifa_addroute()/carp_ifa_delroute()glebius2012-02-011-24/+41
| | | | | | | | | | to cleanup routes from a single ifa. o Implement carp_addroute()/carp_delroute() via above functions. o Call carp_ifa_delroute() in the carp_detach() to avoid junk routes left in routing table, in case if user removes an address in a MASTER state. [1] Reported by: az [1]
* Convert all users of IF_ADDR_LOCK to use new locking macros that specifyjhb2012-01-051-12/+12
| | | | | | | either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks
* Use a better log message for master down event.glebius2011-12-221-1/+1
|
* Restore a feature that was present in 5.x and 6.x, and was cleared inglebius2011-12-201-56/+70
| | | | | | | | | | | | | | | | | | | | | | | 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.
* A major overhaul of the CARP implementation. The ip_carp.c was startedglebius2011-12-161-1379/+1001
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
* In r191367 the need for if_free_type() was removed and a new memberbrooks2011-11-111-1/+1
| | | | | | | | if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month
* Never switch directly from INIT to MASTER, since this producesglebius2011-10-141-18/+4
| | | | | | | | nasty status flaps. PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115
* Make various (pseudo) interfaces compile without INET in the kernelbz2011-04-271-7/+46
| | | | | | | | | | adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
* Redo r166423. It is important not only skip freeing multicastglebius2010-11-241-14/+17
| | | | | entires when underlying interface is detached, but also purge pointers to them, to avoid double-free in future.
* Do not convert some meaningful error value to EINVAL.glebius2010-09-201-4/+4
| | | | Reviewed by: will
* Fix CARP in backup mode by properly registering its hooks for INET and INET6will2010-09-061-0/+15
| | | | | | | | | using ipproto_{un,}register() and the newly created ip6proto_{un,}register() so that it can again receive IPPROTO_CARP packets allowing its state machine to work. Reviewed by: bz Approved by: ken (mentor)
* Fix static kernel builds with carp(4) by changing its SYSINIT order so thatwill2010-09-061-1/+1
| | | | | | | | it is initialized after basic protocol initialization, which allows it to register via pf_proto_register(). Reviewed by: bz Approved by: ken (mentor)
* Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, withwill2010-08-111-20/+0
| | | | | | | the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)
* Allow carp(4) to be loaded as a kernel module. Follow precedent set bywill2010-08-111-24/+152
| | | | | | | | | | | | | | | bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks
* Address an edge condition that we found at work, where the carp(4)delphij2010-08-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at cold boot. This behavior is not observed when carp(4) interface is created slightly later, when the underlying interface is fully up. Before this change what happen at boot is roughly: - ifconfig creates em0 interface; - ifconfig clones a carp device using em0; (em0's link state is DOWN at this point) - carp state: INIT -> BACKUP [*] - carp state: BACKUP -> MASTER - [Some negotiate between em0 and switch] - em0 kicks up link state change event (em0's link state is now up DOWN at this point) - do_link_state_change() -> carp_carpdev_state() - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+] - carp state: INIT -> BACKUP - carp state: BACKUP -> MASTER At the [*] stage, em0 did not received any broadcast message from other node, and assume our node is the master, thus carp(4) sets the link state to "UP" after becoming a master. At [+], the master status is forcely set to "INIT", then an election is casted, after which our node would actually become a master. We believe that at the [*] stage, the master status should remain as "INIT" since the underlying parent interface's link state is not up. Obtained from: iXsystems, Inc. Reported by: jpaetzel MFC after: 2 months
* Complete the swap of carp(4) log levels and document the change.ru2010-01-081-2/+2
| | | | MFC after: 3 days
* Until this moment carp(4) used a strange logging priority. It used debugglebius2009-12-021-16/+16
| | | | | | | | | priority for such important information as MASTER/BACKUP state change, and used a normal logging priority for such innocent messages as receiving short packet (which is a normal VRRP packet between some other routers) or receving a CARP packet on non-carp interface (someone else running CARP). This commit shifts message logging priorities to a more sane default.
* Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs whenwill2009-08-201-3/+3
| | | | | | | | CARP tries to free them using M_IFADDR after the last address for a virtual host is removed and when detaching from the parent interface. Reviewed by: mlaier Approved by: re (kib), ken (mentor)
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+1
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Show interface name which received short CARP packet (e.g. a VRRP packet),delphij2009-07-301-2/+3
| | | | | | | | in order to match other codepaths nearby. This makes troubleshooting easier. Approved by: re (kib) MFC after: 1 month
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorrwatson2009-07-141-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
* Add address list locking for in6_ifaddrhead/ia_link: as with lockingrwatson2009-06-251-2/+11
| | | | | | | | | | | for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks
* Add a new global rwlock, in_ifaddr_lock, which will synchronize use of therwatson2009-06-251-3/+16
| | | | | | | | | | | | | | | | | | | in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
* Fix CARP build.rwatson2009-06-241-1/+1
| | | | Reported by: bz
* Convert netinet6 to using queue(9) rather than hand-crafted linked listsrwatson2009-06-241-1/+1
| | | | | | | | for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)
* Modify most routines returning 'struct ifaddr *' to return referencesrwatson2009-06-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
* Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:bms2009-04-291-31/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.
OpenPOWER on IntegriCloud