summaryrefslogtreecommitdiffstats
path: root/sys/net/if_lagg.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r310180, r310327asomers2017-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | r310180: Fix panic during lagg destruction with simultaneous status check If you run "ifconfig lagg0 destroy" and "ifconfig lagg0" at the same time a page fault may result. The first process will destroy ifp->if_lagg in lagg_clone_destroy (called by if_clone_destroy). Then the second process will observe that ifp->if_lagg is NULL at the top of lagg_port_ioctl and goto fallback: where it will promptly dereference ifp->if_lagg anyway. The solution is to repeat the NULL check for ifp->if_lagg MFC after: 4 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D8512 r310327: Remove stray debugging code from r310180 Reported by: rstone Pointy hat to: asomers MFC after: 3 weeks X-MFC-with: 310180 Sponsored by: Spectra Logic Corp
* MFH 295796 (based on)araujo2016-02-251-2/+3
| | | | | | | | | | | Fix regression introduced on 272446r. lagg(4) supports the protocol none, where it disables any traffic without disabling the lagg(4) interface itself. PR: 206478 Submitted by: Erin Clark <erin.clark.ix@gmail.com> Reviewed by: rpokala, bapt Approved by: re (glebius) Differential Revision: https://reviews.freebsd.org/D5188
* [PR 206219] Kernel panic from lagg_ioctl and lagg_port_ioctlrpokala2016-01-151-2/+18
| | | | | | | | | | | | | | r287723 removed some cleanup from lagg(4), which leads to panics when changing configuration. Restore the spirit of the code which was removed. This issue has been refactored out of existence in -HEAD, so this patch is directly against stable/10. PR: 206219 Submitted by: Fred Lewis < flewis @ panasas.com > Reviewed by: hiren, Daniel O'Connor < darius @ dons.net.au > Approved by: jhb Sponsored by: Panasas, Inc. Differential Revision: https://reviews.freebsd.org/D4929
* Fix a panic in SIOCSLAGG and SIOCGLAGGOPTS. This was caused by ahrs2015-09-211-4/+2
| | | | | | wrongly-MFC'd patch in r287723. Pointy hat to: hrs
* MFC r286700hiren2015-09-151-0/+19
| | | | Make LAG LACP fast timeout tunable through IOCTL.
* MFC 272159,272161,272386,272446,272547,272548,273210:hrs2015-09-121-195/+267
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Make lagg protos a enum. - When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE, then drop lock, run the attach routines, and then set it to specific proto. This removes tons of WITNESS warnings. - Make lagg protocol attach handlers not failing and allocate memory with M_WAITOK. - Virtualize lagg(4) cloner. This change fixes a panic when tearing down if_lagg(4) interfaces which were cloned in a vnet jail. Sysctl nodes which are dynamically generated for each cloned interface (net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift ifconfig(8) parameters have been added instead. Flags and per-interface statistics counters are displayed in "ifconfig -v". - Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for backward compatibility with old ifconfig(8). - Move L2 addr configuration for the primary port to a taskqueue. This fixes LOR of softc rmlock in iflladdr_event handlers. - Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR. - Fix a panic in lacp_transit_expire(). - Fix a panic in lagg_input() upon shutting down a port. - Use printb() for boolean flags in ro_opts and actor_state for LACP. - Fix lladdr configuration which could prevent LACP mode from working. - Fix LORs when a laggport interface has an IPv6 LLA.
* MFC r275358 r275483 r276982 - Removing M_FLOWID by hps@hiren2015-04-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r275358: Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. r275483: Remove M_FLOWID from SCTP code. r276982: Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr manpage. Note: The FreeBSD version has been bumped. Reviewed by: hps, tuexen Sponsored by: Limelight Networks
* MFC r277295:ae2015-01-251-2/+7
| | | | | Fix condition and really sort ports. Also add comment describing the intent of this code.
* MFC r271946 and r272595:hselasky2014-11-031-11/+5
| | | | | | | | | Improve transmit sending offload, TSO, algorithm in general. This change allows all HCAs from Mellanox Technologies to function properly when TSO is enabled. See r271946 and r272595 for more details about this commit. Sponsored by: Mellanox Technologies
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-4/+4
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* MFC r272176:ae2014-10-071-3/+13
| | | | Keep list of lagg ports sorted by if_index.
* MFC r269492:mav2014-08-181-18/+18
| | | | | | | Improve locking of multicast addresses in VLAN and LAGG interfaces. This fixes several scenarios of reproducible panics, cause by races between multicast address changes and interface destruction.
* MFC: r264469, r264498rmacklem2014-05-061-2/+13
| | | | | | | | | Lagg did not set the value of if_hw_tsomax, so when lagg was stacked on top of network interfaces that set if_hw_tsomax, tcp_output() would see the default value instead of the value set by the network interface(s). This patch modifies lagg so that it sets if_hw_tsomax to the minimum of the value(s) for the underlying network interfaces.
* MFC r260070scottl2014-01-021-3/+13
| | | | | | | | | | | Multi-queue NIC drivers and multi-port lagg tend to use the same lower bits of the flowid as each other, resulting in a poor distribution of packets among queues in certain cases. Work around this by adding a set of sysctls for controlling a bit-shift on the flowid when doing multi-port aggrigation in lagg and lacp. By default, lagg/lacp will now use bits 16 and higher instead of 0 and higher. Obtained from: Netflix
* There are some high performance NICs that count statistics in hardware,glebius2013-10-091-0/+1
| | | | | | | | | | and there are ifnets, that do that via counter(9). Provide a flag that would skip cache line trashing '+=' operation in ether_input(). Sponsored by: Netflix Sponsored by: Nginx, Inc. Reviewed by: melifaro, adrian Approved by: re (marius)
* Convert the if_lagg rwlock to an rmlock.adrian2013-08-291-24/+41
| | | | | | | | | | | | | We've been seeing lots of cache line contention (but not lock contention!) in our workloads between the various TX and RX threads going on. The write lock is only grabbed when configuration changes are made - which are infrequent. With this patch, the contention and cycles spent waiting for updates disappear. Sponsored by: Netflix, Inc.
* Break out the static, global LACP debug options into a per-lagg unitadrian2013-07-261-1/+2
| | | | | | | | | | | | | | | sysctl tree. * Create a net.link.lagg.X.lacp node * Add a debug node under that for tx_test and rx_test * Add lacp_strict_mode, defaulting to 1 tx_test and rx_test are still a bitmap of unit numbers for now. At some point it would be nice to create child nodes of the lagg bundle for each sub-interface, and then populate those with various knobs and statistics. Sponsored by: Netflix
* Bring over some link aggregation / LACP protocol improvements and debuggingadrian2013-07-131-1/+29
| | | | | | | | | | | | | | | | | | | additions. * Add some new tracing events to aid in debugging. * Add in a debugging mode to drop transmit and received frames, specifically to test whether seeing or hearing heartbeats correctly cause LACP to drop the port. * Add in (and make default) a strict LACP mode, which requires the heartbeat on a port to be heard before it's used. Sometimes vendor ports will hang but the link layer stays up, resulting in hung traffic. * Add logging the number of link status flaps, again to aid in debugging badly behaving switch ports. * Calculate the lagg interface port speed as the multiple of the configured ports, rather than the largest. Obtained from: Netflix MFC after: 2 weeks
* - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGEhrs2013-07-021-0/+30
| | | | | | | | | | | | | | | | | | | | | | | is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal. To configure an autoconfigured link-local address (RFC 4862), the following rc.conf(5) configuration can be used: ifconfig_bridge0_ipv6="inet6 auto_linklocal" - if_bridge(4) now removes IPv6 addresses on a member interface to be added when the parent interface or one of the existing member interfaces has an IPv6 address. if_bridge(4) merges each link-local scope zone which the member interfaces form respectively, so it causes address scope violation. Removal of the IPv6 addresses prevents it. - if_lagg(4) now removes IPv6 addresses on a member interfaces unconditionally. - Set reasonable flags to non-IPv6-capable interfaces. [*] Submitted by: rpaulo [*] MFC after: 1 week
* Return ENETDOWN instead of ENOENT when all lagg(4) links aredelphij2013-06-171-3/+3
| | | | | | | | inactive when upper layer tries to transmit packet. This gives better feedback and meaningful errors for applications. MFC after: 2 weeks Reviewed by: thompsa
* Properly set curvnet context in lagg_port_setlladdr() task handler.trociny2013-06-071-0/+2
| | | | | | | Reported by: Nikos Vassiliadis <nvass gmx.com> Submitted by: zec Tested by: Nikos Vassiliadis <nvass gmx.com> MFC after: 1 week
* Add const qualifier to the dst parameter of the ifnet if_output method.glebius2013-04-261-2/+2
|
* Switch lagg(4) statistics to counter(9).glebius2013-04-151-4/+34
| | | | | | | | | | | | | | The lagg(4) is often used to bond high speed links, so basic per-packet += on statistics cause cache misses and statistics loss. Perfect solution would be to convert ifnet(9) to counters(9), but this requires much more work, and unfortunately ABI change, so temporarily patch lagg(4) manually. We store counters in the softc, and once per second push their values to legacy ifnet counters. Sponsored by: Nginx, Inc.
* Remove __FreeBSD_version ifdefs.glebius2013-03-221-6/+0
|
* If lagg(4) can't forward a packet due to underlying port problems,glebius2013-01-211-2/+2
| | | | return much more meaningful ENETDOWN to the stack, instead of EBUSY.
* Fix build.delphij2012-10-171-1/+1
|
* report total number of ports for each lagg(4) interfaceemax2012-10-161-0/+3
| | | | | | via net.link.lagg.X.count sysctl MFC after: 1 week
* Make the "struct if_clone" opaque to users of the cloning API. Usersglebius2012-10-161-5/+7
| | | | | | | | | | | | now use function calls: if_clone_simple() if_clone_advanced() to initialize a cloner, instead of macros that initialize if_clone structure. Discussed with: brooks, bz, 1 year ago
* Revert previous commit...kevlo2012-10-101-1/+1
| | | | Pointyhat to: kevlo (myself)
* Prefer NULL over 0 for pointerskevlo2012-10-091-1/+1
|
* Convert lagg(4) to use if_transmit instead of if_start.glebius2012-09-201-24/+32
| | | | In collaboration with: thompsa, sbruno, fabient
* Add the same check as vlan(4) where we ignore the ifnet departure event if thethompsa2012-06-301-0/+3
| | | | | | | | interface is just being renamed. PR: kern/169557 Submitted by: Mark Johnston MFC after: 3 days
* if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a rowrea2012-05-281-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, 'ifconfig laggX down' does not remove members from this lagg(4) interface. So, 'service netif stop laggX' followed by 'service netif start laggX' will choke, because "stop" will leave interfaces attached to the laggX and ifconfig from the "start" will refuse to add already-existing interfaces. The real-world case is when I am bundling together my Ethernet and WiFi interfaces and using multiple profiles for accessing network in different places: system being booted up with one profile, but later this profile being exchanged to another one, followed by 'service netif restart' will not add WiFi interface back to the lagg: the "stop" action from 'service netif restart' will shut down my main WiFi interface, so wlan0 that exists in the lagg0 will be destroyed and purged from lagg0; the "start" action will try to re-add both interfaces, but since Ethernet one is already in lagg0, ifconfig will refuse to add the wlan0 from WiFi interface. Since adding the interface to the lagg(4) when it is already here should be an idempotent action: we're really not changing anything, so this fix doesn't change the semantics of interface addition. Approved by: thompsa Reviewed by: emaste MFC after: 1 week
* Relax restriction on direct tx to child portsemaste2012-05-031-13/+3
| | | | | | | | | | | | | Lagg(4) restricts the type of packet that may be sent directly to a child port, to avoid undesired output from accidental misconfiguration. Previously only ETHERTYPE_PAE was permitted. BPF writes to a lagg(4) child port are presumably intentional, so just allow them, while still blocking other packets that should take the aggregation path. PR: kern/138620 Approved by: thompsa@
* Set the proto to LAGG_PROTO_NONE before calling the detach routine so packetsthompsa2012-04-121-6/+10
| | | | | | | | are discarded, this is an issue because lacp drops the lock which may allow network threads to access freed memory. Expand the lock coverage so the detach/attach happen atomically. Submitted by: Andrew Boyer (earlier version)
* Move the vlan buffer space into the union which also fixes an unused variablethompsa2012-03-071-2/+2
| | | | | | warning with !INET & !INET6. Spotted by: pluknet
* Add the ability to set which packet layers are used for the load balance hashthompsa2012-03-061-13/+65
| | | | calculation.
* Add a sysctl/tunable default value for the use_flowid sysctl in r232008.thompsa2012-02-231-1/+6
|
* Using the flowid in the mbuf assumes the network card is giving a good hash forthompsa2012-02-221-1/+13
| | | | | | | | | the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week
* In r191367 the need for if_free_type() was removed and a new memberbrooks2011-11-111-4/+3
| | | | | | | | if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+2
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Add missing MODULE_VERSION() definition to protect against duplicatingpluknet2011-08-011-0/+1
| | | | | | | | | | module loads. PR: kern/159345 Reported by: Eugene Grosbein <egrosbein att rdtc ru> Tested by: Eugene Grosbein <egrosbein att rdtc ru> Approved by: re (kib) MFC after: 1 week
* Grab the rlock before checking if our interface is enabled, it could bethompsa2011-07-071-1/+2
| | | | | | | | possible to hit a dead pointer when changing interfaces. PR: kern/156978 Submitted by: Andrew Boyer MFC after: 1 week
* LACP frames must not be send VLAN-tagged, check for that before processing.thompsa2011-04-301-1/+1
| | | | | | PR: kern/156743 Submitted by: Dmitrij Tejblum MFC after: 1 week
* Make various (pseudo) interfaces compile without INET in the kernelbz2011-04-271-1/+3
| | | | | | | | | | adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
* Fix a panic that can happen when trying to destroy a lagg(4) with scheduler ↵eri2011-03-041-1/+2
| | | | | | | set to none. Approved by: thompsa(mentor) MFC after: 1 week
* Add a sysctl knob to accept input packets on any link in a failover lagg.emaste2010-09-011-1/+9
|
* Remove the check for IFF_DRV_OACTIVE right before adding a port into laggdelphij2010-03-091-4/+0
| | | | | | | | interface. The check itself seems to be coming from OpenBSD but does not seem to be useful for our code. Discussed with: thomasa MFC after: 1 month
* Propagate the vlan eventis to the underlying interfaces/members so they can ↵eri2010-02-061-0/+57
| | | | | | | | | do initialization of hw related features. PR: kern/141646 Reviewed by: thompsa Approved by: thompsa(co-mentor) MFC after: 2 weeks
* Declare a new EVENTHANDLER called iflladdr_event which signals that the L2thompsa2010-01-181-0/+1
| | | | | | | | | | | | | address on an interface has changed. This lets stacked interfaces such as vlan(4) detect that their lower interface has changed and adjust things in order to keep working. Previously this situation broke at least vlan(4) and lagg(4) configurations. The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the risk of a loop. PR: kern/142927 Submitted by: Nikolay Denev
OpenPOWER on IntegriCloud