summaryrefslogtreecommitdiffstats
path: root/sys/netinet/ip_carp.c
Commit message (Collapse)AuthorAgeFilesLines
* Revert r292275 & r292379smh2015-12-171-7/+10
| | | | | | | glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay
* Fix issues introduced by r292275smh2015-12-161-1/+1
| | | | | | | | | | | | * Fix panic for etherswitches which don't have a LLADDR. * Disabled DELAY in unsolicited NDA, which needs further work. * Fixed missing DELAY in carp_send_na. * style(9) fix. Reported by: kp & melifaro X-MFC-With: r292275 MFC after: 1 month Sponsored by: Multiplay
* Fix lagg failover due to missing notificationssmh2015-12-151-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
* Revert r290403smh2015-11-131-7/+0
| | | | CARP rework invalidated this change.
* Decompose arp_ifinit() into arp_add_ifa_lle() and arp_announce_ifaddr().melifaro2015-11-091-3/+7
| | | | | | | | Rename arp_ifinit2() into arp_announce_ifaddr(). Eliminate zeroing ifa_rtrequest: it was used for calling arp_rtrequest() which was responsible for handling route cloning requests. It became obsolete since r186119 (L2/L3 split).
* Add MTU support to carp interfacessmh2015-11-051-0/+7
| | | | | MFC after: 2 weeks Sponsored by: Multiplay
* * Do more fine-grained locking: call eventhandlers/free_entrymelifaro2015-09-141-1/+1
| | | | | | | | | | without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573
* Improve carp(4) locking:glebius2015-04-211-62/+39
| | | | | | | | | | | | | - Use the carp_sx to serialize not only CARP ioctls, but also carp_attach() and carp_detach(). - Use cif_mtx to lock only access to those the linked list. - These locking changes allow us to do some memory allocations with M_WAITOK and also properly call callout_drain() in carp_destroy(). - In carp_attach() assert that ifaddr isn't attached. We always come here with a pristine address from in[6]_control(). Reviewed by: oleg Sponsored by: Nginx, Inc.
* Add sleepable lock to protect at least against two parallel SIOCSVHs.glebius2015-04-061-3/+5
| | | | Sponsored by: Nginx, Inc.
* o Use new function ip_fillid() in all places throughout the kernel,glebius2015-04-011-1/+1
| | | | | | | | | | | | | | | | | where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes
* Log hardware interface up/down as "hardware" rather than just "hw".will2015-01-231-2/+2
| | | | | | Suggested by: glebius MFC after: 1 week MFC with: 277530
* When a CARP state change is caused by an ifconfig request, log it accordingly.will2015-01-231-2/+4
| | | | | | Suggested by: glebius MFC after: 1 week MFC with: 277530
* Improve CARP logging so that all state transitions are logged.will2015-01-221-34/+23
| | | | | | | | | | | | | | | | | | | | | sys/netinet/ip_carp.c: Add a "reason" string parameter to carp_set_state() and carp_master_down_locked() allowing more specific logging information to be passed into these apis. Refactor existing state transition logging into a single log call in carp_set_state(). Update all calls to carp_set_state() and carp_master_down_locked() to pass an appropriate reason string. For state transitions that were previously logged, the output should be unchanged. Submitted by: gibbs (original), asomers (updated) MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1039697 on 2014/02/11 (original) 1049992 on 2014/03/21 (updated)
* Remove the check that prevent carp(4) advskew to be set to '0'.loos2015-01-061-6/+4
| | | | | | | | | | | | | | | | CARP devices are created with advskew set to '0' and once you set it to any other value in the valid range (0..254) you can't set it back to zero. The code in question is also used to prevent that zeroed values overwrite the CARP defaults when a new CARP device is created. Since advskew already defaults to '0' for newly created devices and the new value is guaranteed to be within the valid range, it is safe to overwrite it here. PR: 194672 Reported by: cmb@pfsense.org In collaboration with: garga Tested by: garga MFC after: 2 weeks
* To ease changes to underlying mbuf structure and the mbuf allocator, reducerwatson2015-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.glebius2014-11-071-6/+9
| | | | Sponsored by: Nginx, Inc.
* Change pr_output's prototype to avoid the need for explicit casts.kevlo2014-08-151-2/+2
| | | | | | | This is a follow up to r269699. Phabric: D564 Reviewed by: jhb
* Merge 'struct ip6protosw' and 'struct protosw' into one. Now we havekevlo2014-08-081-12/+16
| | | | | | | only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb
* Improve logging of send errors, reporting error code and interface.glebius2014-02-221-38/+33
| | | | | | Reduce code duplication between INET and INET6. Tested by: Lytochkin Boris <lytboris gmail.com>
* Further rework netinet6 address handling code:melifaro2014-01-191-2/+2
| | | | | | | | | | | | | | | | | | | * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.
* Make failure of ifpromisc() a non-fatal error. This makes it possible toglebius2014-01-031-17/+11
| | | | | | run carp(4) on vtnet(4). Sponsored by: Nginx, Inc.
* The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareglebius2013-10-261-0/+1
| | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
* Relese the interface in the last.trociny2013-09-081-1/+1
| | | | | Reviewed by: glebius Approved by: re (kib)
* Virtualize carp(4) variables to have per vnet control.trociny2013-08-131-53/+61
| | | | Reviewed by: ae, glebius
* Migrate struct carpstats to PCPU counters.ae2013-07-091-3/+24
|
* Add const qualifier to the dst parameter of the ifnet if_output method.glebius2013-04-261-1/+1
|
* Use m_get/m_gethdr instead of compat macros.glebius2013-03-151-2/+2
| | | | Sponsored by: Nginx, Inc.
* Resolve source address selection in presense of CARP. Add a coupleglebius2013-02-111-0/+10
| | | | | | | | | | | | | | | | | | of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc
* Garbage collect carp_cksum().glebius2012-12-251-10/+4
|
* Change net.inet.carp.demotion sysctl to add the supplied valueglebius2012-12-251-3/+20
| | | | | | | to the current demotion factor instead of assigning it. This allows external scripts to control demotion factor together with kernel in a raceless manner.
* Switch the entire IPv4 stack to keep the IP packet headerglebius2012-10-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
* Revert previous commit...kevlo2012-10-101-1/+1
| | | | Pointyhat to: kevlo (myself)
* Prefer NULL over 0 for pointerskevlo2012-10-091-1/+1
|
* carp_send_ad() should never return without rescheduling next run.glebius2012-09-291-11/+6
|
* Fix a problem when CARP is enabled on the interface for IPv4bz2012-07-251-8/+16
| | | | | | | | | | | | | | | | but not for IPv6. The current checks in nd6_nbr.c along with the old version will result in ifa being NULL and subsequently the packet will be dropped. This prevented NS/NA, from working and with that IPv6. Now return the ifa from the carp lookup function in two cases: 1) if the address matches, is a carp address, and we are MASTER (as before), 2) if the address matches but it is not a carp address at all (new). Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood) Tested on: New Y! FreeBSD cluster machines Reviewed by: glebius
* Improve style(9) of bcopy() to and from mbuf tag.glebius2012-05-301-4/+3
| | | | Submitted by: bde
* After r228571 carp_output() expects carp_softc * pointer in the mtag.glebius2012-05-301-3/+3
| | | | Noticed by: thompsa
* It is a logical error that in carp_multicast_cleanup()glebius2012-04-111-24/+59
| | | | | | | | we look at count of addresses on a particular vhid, we should account number of addresses on cif. To achieve this we need to run carp_attach() and carp_detach() under appropriate cif lock.
* CARP should be capable to run on if_bridge(4). Unfortunately,glebius2012-04-101-0/+2
| | | | | | | | this commit is not enough to enable CARP operation on if_bridge(4), because the latter doesn't handle or even initialize its ifp->if_link_state. Reported by: Alexander Lunev <sol289 gmail.com>
* Set vnet context in callouts and taskqueues.glebius2012-02-081-0/+8
| | | | PR: 164696
* o Provide functions carp_ifa_addroute()/carp_ifa_delroute()glebius2012-02-011-24/+41
| | | | | | | | | | to cleanup routes from a single ifa. o Implement carp_addroute()/carp_delroute() via above functions. o Call carp_ifa_delroute() in the carp_detach() to avoid junk routes left in routing table, in case if user removes an address in a MASTER state. [1] Reported by: az [1]
* Convert all users of IF_ADDR_LOCK to use new locking macros that specifyjhb2012-01-051-12/+12
| | | | | | | either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks
* Use a better log message for master down event.glebius2011-12-221-1/+1
|
* Restore a feature that was present in 5.x and 6.x, and was cleared inglebius2011-12-201-56/+70
| | | | | | | | | | | | | | | | | | | | | | | 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.
* A major overhaul of the CARP implementation. The ip_carp.c was startedglebius2011-12-161-1379/+1001
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
* In r191367 the need for if_free_type() was removed and a new memberbrooks2011-11-111-1/+1
| | | | | | | | if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month
* Never switch directly from INIT to MASTER, since this producesglebius2011-10-141-18/+4
| | | | | | | | nasty status flaps. PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115
* Make various (pseudo) interfaces compile without INET in the kernelbz2011-04-271-7/+46
| | | | | | | | | | adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
* Redo r166423. It is important not only skip freeing multicastglebius2010-11-241-14/+17
| | | | | entires when underlying interface is detached, but also purge pointers to them, to avoid double-free in future.
* Do not convert some meaningful error value to EINVAL.glebius2010-09-201-4/+4
| | | | Reviewed by: will
OpenPOWER on IntegriCloud