summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller2014-01-1343-768/+1652
|\ | | | | | | | | | | | | | | | | | | | | Included changes: - drop dependency against CRC16 - move to new release version - add size check at compile time for packet structs - update copyright years in every file - implement new bonding/interface alternation feature Signed-off-by: David S. Miller <davem@davemloft.net>
| * batman-adv: drop dependency against CRC16Antonio Quartulli2014-01-121-1/+0
| | | | | | | | | | | | | | The crc16 functionality is not used anymore, therefore we can safely remove the dependency in the Kbuild file. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: Start new development cycleSimon Wunderlich2014-01-121-1/+1
| | | | | | | | | | Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: update copyright years for 2014Simon Wunderlich2014-01-1241-41/+41
| | | | | | | | | | | | Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: add build checks for packet sizesSimon Wunderlich2014-01-121-7/+17
| | | | | | | | | | | | | | | | | | | | | | With unrolling the batadv_header into the respective structures, the offsetof checks are now useless. Instead, add build checks for all packet types which go over the wire to avoid problems with wrong sizes or compatibility issues on some architectures which don't use every day. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: add missing sysfs attributes to READMEAntonio Quartulli2014-01-121-5/+4
| | | | | | | | | | | | | | Add missing sysfs attributes in the proper section of the README Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: remove returns at the end of void functionsAntonio Quartulli2014-01-121-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | Return at the end of void functions is not needed. Since most of the void functions in the code do not do so, make all the others consistent by removing the useless returns. Actually all the functions to be "fixed" are in network-coding.h only. Signed-off-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
| * batman-adv: add debugfs support to view multiif tablesSimon Wunderlich2014-01-125-7/+73
| | | | | | | | | | | | | | | | | | | | | | | | Show tables for the multi interface operation. Originator tables are added per hard interface. This patch also changes the API by adding the interface to the bat_orig_print() parameters. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: add debugfs structure for information per interfaceSimon Wunderlich2014-01-124-0/+86
| | | | | | | | | | | | | | | | | | | | To show information per interface, add a debugfs hardif structure similar to the system in sysfs. Hard interface folders will be created in "$debugfs/batman-adv/". Files are not yet added. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: add bonding againSimon Wunderlich2014-01-124-4/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | With the new interface alternating, the first hop may send packets in a round robin fashion to it's neighbors because it has multiple valid routes built by the multi interface optimization. This patch enables the feature if bonding is selected. Note that unlike the bonding implemented before, this version is much simpler and may even enable multi path routing to a certain degree. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: consider outgoing interface in OGM sendingSimon Wunderlich2014-01-123-64/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current OGM sending an aggregation functionality decides on which interfaces a packet should be sent when it parses the forward packet struct. However, with the network wide multi interface optimization the outgoing interface is decided by the OGM processing function. This is reflected by moving the decision in the OGM processing function and add the outgoing interface in the forwarding packet struct. This practically implies that an OGM may be added multiple times (once per outgoing interface), and this also affects aggregation which needs to consider the outgoing interface as well. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: add WiFi penaltySimon Wunderlich2014-01-123-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | If the same interface is used for sending and receiving, there might be throughput degradation on half-duplex interfaces such as WiFi. Add a penalty if the same interface is used to reflect this problem in the metric. At the same time, change the hop penalty from 30 to 15 so there will be no change for single wifi mesh network. the effective hop penalty will stay at 30 due to the new wifi penalty for these networks. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: split out router from orig_nodeSimon Wunderlich2014-01-1211-219/+605
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the network wide multi interface optimization there are different routers for each outgoing interface (outgoing from the OGM perspective, incoming for payload traffic). To reflect this, change the router and associated data to a list of routers. While at it, rename batadv_orig_node_get_router() to batadv_orig_router_get() to follow the new naming scheme. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: split tq information in neigh_node structSimon Wunderlich2014-01-129-114/+588
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the network wide multi interface optimization it is required to save metrics per outgoing interface in one neighbor. Therefore a new type is introduced to keep interface-specific information. This also requires some changes in access and list management. The compare and equiv_or_better API calls are changed to take the outgoing interface into consideration. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * batman-adv: remove bonding and interface alternatingSimon Wunderlich2014-01-125-327/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove bonding and interface alternating code - it will be replaced by a new, network-wide multi interface optimization which enables both bonding and interface alternating in a better way. Keep the sysfs and find router function though, this will be needed later. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | net: resort some Kbuild files to hopefully help avoid some conflictsStephen Rothwell2014-01-136-6/+6
| | | | | | | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'qlcnic'David S. Miller2014-01-1312-63/+184
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shahed Shaikh says: ==================== This series includes following changes: o SRIOV and VLAN filtering related enhancements which includes - Do MAC learning for PF - Restrict VF from configuring any VLAN mode - Enable flooding on PF - Turn on promiscuous mode for PF o Bug fix in qlcnic_sriov_cleanup() introduced by commit 154d0c81("qlcnic: VLAN enhancement for 84XX adapters") o Beaconing support for 83xx and 84xx series adapters o Allow 82xx adapter to perform IPv6 LRO even if destination IP address is not programmed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Update version to 5.3.54Shahed Shaikh2014-01-131-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Enable IPv6 LRO even if IP address is not programmedShahed Shaikh2014-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | o Enabling BIT_9 while configuring hardware LRO allows adapter to perform LRO even if destination IP address is not programmed in adapter. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Fix SR-IOV cleanup code pathManish Chopra2014-01-131-3/+1
| | | | | | | | | | | | | | | | | | | | | o Add __QLCNIC_SRIOV_ENABLE bit check before doing SRIOV cleanup Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Enable beaconing for 83xx/84xx Series adapter.Himanshu Madhani2014-01-137-24/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | o Refactored code to handle beaconing test for all adapters. o Use GET_LED_CONFIG mailbox command for 83xx/84xx series adapter to detect current beaconing state of the adapter. Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Do MAC learning for SRIOV PF.Sucheta Chakraborty2014-01-136-31/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | o MAC learning will be done for SRIOV PF to help program VLAN filters onto adapter. This will help VNIC traffic to flow through without flooding traffic. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Turn on promiscous mode for SRIOV PF.Sucheta Chakraborty2014-01-131-0/+4
| | | | | | | | | | | | | | | | | | | | | o By default, SRIOV PF will have promiscous mode on. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Enable VF flood bit on PF.Sucheta Chakraborty2014-01-131-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | o On enabling VF flood bit, PF driver will be able to receive traffic from all its VFs. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | qlcnic: Restrict VF from configuring any VLAN mode.Sucheta Chakraborty2014-01-131-1/+11
|/ / | | | | | | | | | | | | | | | | o Adapter should allow vlan traffic only for vlans configured on a VF. On configuring any vlan mode from VF, adapter will allow any vlan traffic to pass for that VF. Do not allow VF to configure this mode. Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: make dev_set_mtu() honor notification return codeVeaceslav Falico2014-01-131-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, after changing the MTU for a device, dev_set_mtu() calls NETDEV_CHANGEMTU notification, however doesn't verify it's return code - which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this change, and continues nevertheless. To fix this, verify the return code, and if it's an error - then revert the MTU to the original one, notify again and pass the error code. CC: Jiri Pirko <jiri@resnulli.us> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* | packet: doc: describe PACKET_MMAP with one packet socket for rx and txNorbert van Bolhuis2014-01-131-0/+18
| | | | | | | | | | | | | | | | Document how to use one AF_PACKET mmap socket for RX and TX. Signed-off-by: Norbert van Bolhuis <nvbolhuis@aimvalley.nl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: make sctp_addto_chunk_fixed localstephen hemminger2014-01-132-3/+4
| | | | | | | | | | | | Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: Add of_phy_attachAndy Fleming2014-01-132-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | 10G PHYs don't currently support running the state machine, which is implicitly setup via of_phy_connect(). Therefore, it is necessary to implement an OF version of phy_attach(), which does everything except start the state machine. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: Support attaching to generic 10g driverAndy Fleming2014-01-132-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | phy_attach_direct() may now attach to a generic 10G driver. It can also be used exactly as phy_connect_direct(), which will be useful when using of_mdio, as phy_connect (and therefore of_phy_connect) start the PHY state machine, which is currently irrelevant for 10G PHYs. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: Add generic 10G driverAndy Fleming2014-01-131-0/+81
| | | | | | | | | | | | | | | | | | Very incomplete, but will allow for binding an ethernet controller to it. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: turn genphy_driver to an arrayShaohui Xie2014-01-131-8/+21
| | | | | | | | | | | | | | Then other generic phy driver such as generic 10g phy driver can join it. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: introduce PHY_INTERFACE_MODE_XGMII for 10G PHYAndy Fleming2014-01-132-0/+2
| | | | | | | | | | | | | | Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: Add Clause 45 read/write functionsAndy Fleming2014-01-131-0/+39
| | | | | | | | | | | | | | | | | | Need an extra parameter to read or write Clause 45 PHYs, so need a different API with the extra parameter. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | l2tp: make local functions staticstephen hemminger2014-01-132-6/+2
| | | | | | | | | | | | | | | | Avoid needless export of local functions Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnx2x: namespace and dead code cleanupsstephen hemminger2014-01-1311-524/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bunch of whole lot of namespace issues with the Broadcom bnx2x driver found by running 'make namespacecheck' * global variables must be prefixed with bnx2x_ naming a variable int_mode, or num_queue is invitation to disaster * make local functions static * move some inline's used in one file out of header (this driver has a bad case of inline-itis) * remove resulting dead code fallout bnx2x_pfc_statistic, bnx2x_emac_get_pfc_stat bnx2x_init_vlan_mac_obj, Looks like vlan mac support in this driver was a botch from day one either never worked, or not implemented or missing support functions Compile tested only. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | drivers: net: silence compiler warning in smc91x.cPankaj Dubey2014-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If used 64 bit compiler GCC warns that: drivers/net/ethernet/smsc/smc91x.c:1897:7: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] This patch fixes this by changing typecast from "unsigned int" to "unsigned long" CC: "David S. Miller" <davem@davemloft.net> CC: Jingoo Han <jg1.han@samsung.com> CC: netdev@vger.kernel.org Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gre_offload: simplify GRE header length calculation in gre_gso_segment()Neal Cardwell2014-01-131-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the GRE header length calculation in gre_gso_segment(). Switch to an approach that is simpler, faster, and more general. The new approach will continue to be correct even if we add support for the optional variable-length routing info that may be present in a GRE header. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act: remove struct tcf_act_hdrWANG Cong2014-01-132-12/+8
| | | | | | | | | | | | | | | | | | | | It is not necessary at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: avoid casting void pointerWANG Cong2014-01-135-26/+23
| | | | | | | | | | | | | | | | | | | | tp->root is a void* pointer, no need to cast it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: optimize tcf_match_indev()WANG Cong2014-01-133-29/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcf_match_indev() is called in fast path, it is not wise to search for a netdev by ifindex and then compare by its name, just compare the ifindex. Also, dev->name could be changed by user-space, therefore the match would be always fail, but dev->ifindex could be consistent. BTW, this will also save some bytes from the core struct of u32. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: add struct net pointer to tcf_proto_ops->dumpWANG Cong2014-01-1311-15/+16
| | | | | | | | | | | | | | | | | | | | It will be needed by the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act: clean up notification functionsWANG Cong2014-01-131-55/+40
| | | | | | | | | | | | | | | | | | | | Refactor tcf_add_notify() and factor out tcf_del_notify(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: act: move idx_gen into struct tcf_hashinfoWANG Cong2014-01-1311-26/+18
| | | | | | | | | | | | | | | | | | | | | | There is no need to store the index separatedly since tcf_hashinfo is allocated statically too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: gro: change GRO overflow strategyEric Dumazet2014-01-131-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GRO layer has a limit of 8 flows being held in GRO list, for performance reason. When a packet comes for a flow not yet in the list, and list is full, we immediately give it to upper stacks, lowering aggregation performance. With TSO auto sizing and FQ packet scheduler, this situation happens more often. This patch changes strategy to simply evict the oldest flow of the list. This works better because of the nature of packet trains for which GRO is efficient. This also has the effect of lowering the GRO latency if many flows are competing. Tested : Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM netperf. Before patch, aggregate rate is 11Gbps (while a single flow can reach 30Gbps) After patch, line rate is reached. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4_en: call gro handler for encapsulated framesEric Dumazet2014-01-132-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to use the native GRO handling of encapsulated protocols on mlx4, we need to call napi_gro_receive() instead of netif_receive_skb() unless busy polling is in action. While we are at it, rename mlx4_en_cq_ll_polling() to mlx4_en_cq_busy_polling() Tested with GRE tunnel : GRO aggregation is now performed on the ethernet device instead of being done later on gre device. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Jerry Chu <hkchu@google.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gre_offload: fix sparse non static symbol warningWei Yongjun2014-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | Fixes the following sparse warning: net/ipv4/gre_offload.c:253:5: warning: symbol 'gre_gro_complete' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'ip_forward_pmtu'David S. Miller2014-01-1314-31/+134
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hannes Frederic Sowa says: ==================== path mtu hardening patches After a lot of back and forth I want to propose these changes regarding path mtu hardening and give an outline why I think this is the best way how to proceed: This set contains the following patches: * ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing * ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with it * ipv4: introduce hardened ip_no_pmtu_disc mode The first one switches the forwarding path of IPv4 to use the interface mtu by default and ignore a possible discovered path mtu. It provides a sysctl to switch back to the original behavior (see discussion below). The second patch does the same thing unconditionally for IPv6. I don't provide a knob for IPv6 to switch to original behavior (please see below). The third patch introduces a hardened pmtu mode, where only pmtu information are accepted where the protocol is able to do more stringent checks on the icmp piggyback payload (please see the patch commit msg for further details). Why is this change necessary? First of all, RFC 1191 4. Router specification says: "When a router is unable to forward a datagram because it exceeds the MTU of the next-hop network and its Don't Fragment bit is set, the router is required to return an ICMP Destination Unreachable message to the source of the datagram, with the Code indicating "fragmentation needed and DF set". ..." For some time now fragmentation has been considered problematic, e.g.: * http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-87-3.pdf * http://tools.ietf.org/search/rfc4963 Most of them seem to agree that fragmentation should be avoided because of efficiency, data corruption or security concerns. Recently it was shown possible that correctly guessing IP ids could lead to data injection on DNS packets: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> While we can try to completly stop fragmentation on the end host (this is e.g. implemented via IP_PMTUDISC_INTERFACE), we cannot stop fragmentation completly on the forwarding path. On the end host the application has to deal with MTUs and has to choose fallback methods if fragmentation could be an attack vector. This is already the case for most DNS software, where a maximum UDP packet size can be configured. But until recently they had no control over local fragmentation and could thus emit fragmented packets. On the forwarding path we can just try to delay the fragmentation to the last hop where this is really necessary. Current kernel already does that but only because routers don't receive feedback of path mtus, these are only send back to the end host system. But it is possible to maliciously insert path mtu inforamtion via ICMP packets which have an icmp echo_reply payload, because we cannot validate those notifications against local sockets. DHCP clients which establish an any-bound RAW-socket could also start processing unwanted fragmentation-needed packets. Why does IPv4 has a knob to revert to old behavior while IPv6 doesn't? IPv4 does fragmentation on the path while IPv6 does always respond with packet-too-big errors. The interface MTU will always be greater than the path MTU information. So we would discard packets we could actually forward because of malicious information. After this change we would let the hop, which really could not forward the packet, notify the host of this problem. IPv4 allowes fragmentation mid-path. In case someone does use a software which tries to discover such paths and assumes that the kernel is handling the discovered pmtu information automatically. This should be an extremly rare case, but because I could not exclude the possibility this knob is provided. Also this software could insert non-locked mtu information into the kernel. We cannot distinguish that from path mtu information currently. Premature fragmentation could solve some problems in wrongly configured networks, thus this switch is provided. One frag-needed packet could reduce the path mtu down to 522 bytes (route/min_pmtu). Misc: IPv6 neighbor discovery could advertise mtu information for an interface. These information update the ipv6-specific interface mtu and thus get used by the forwarding path. Tunnel and xfrm output path will still honour path mtu and also respond with Packet-too-Big or fragmentation-needed errors if needed. Changelog for all patches: v2) * enabled ip_forward_use_pmtu by default * reworded v3) * disabled ip_forward_use_pmtu by default * reworded v4) * renamed ip_dst_mtu_secure to ip_dst_mtu_maybe_forward * updated changelog accordingly * removed unneeded !!(... & ...) double negations v2) * by default we honour pmtu information 3) * only honor interface mtu * rewritten and simplified * no knob to fall back to old mode any more v2) * reworded Documentation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: introduce hardened ip_no_pmtu_disc modeHannes Frederic Sowa2014-01-136-6/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors to be honored by protocols which do more stringent validation on the ICMP's packet payload. This knob is useful for people who e.g. want to run an unmodified DNS server in a namespace where they need to use pmtu for TCP connections (as they are used for zone transfers or fallback for requests) but don't want to use possibly spoofed UDP pmtu information. Currently the whitelisted protocols are TCP, SCTP and DCCP as they check if the returned packet is in the window or if the association is valid. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with itHannes Frederic Sowa2014-01-131-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the IPv6 forwarding path we are only concerend about the outgoing interface MTU, but also respect locked MTUs on routes. Tunnel provider or IPSEC already have to recheck and if needed send PtB notifications to the sending host in case the data does not fit into the packet with added headers (we only know the final header sizes there, while also using path MTU information). The reason for this change is, that path MTU information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv6 forwarding path to wrongfully emit Packet-too-Big errors and drop IPv6 packets. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud