summaryrefslogtreecommitdiffstats
path: root/drivers/net/vxlan.c
Commit message (Collapse)AuthorAgeFilesLines
* vxlan: fix too large pskb_may_pull with remote checksumJiri Benc2016-03-211-4/+2
| | | | | | | | | vxlan_remcsum is called after iptunnel_pull_header and thus the skb has vxlan header already pulled. Don't include vxlan header again in the calculation. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: fix populating tclass in vxlan6_get_routeDaniel Borkmann2016-03-201-2/+1
| | | | | | | | | | | | | | | | Jiri mentioned that flowi6_tos of struct flowi6 is never used/read anywhere. In fact, rest of the kernel uses the flowi6's flowlabel, where the traffic class _and_ the flowlabel (aka flowinfo) is encoded. For example, for policy routing, fib6_rule_match() uses ip6_tclass() that is applied on the flowlabel member for matching on tclass. Similar fix is needed for geneve, where flowi6_tos is set as well. Installing a v6 blackhole rule that f.e. matches on tos is now working with vxlan. Fixes: 1400615d64cf ("vxlan: allow setting ipv6 traffic class") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* gro: Defer clearing of flush bit in tunnel pathsAlexander Duyck2016-03-131-2/+1
| | | | | | | | | | | | | | This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that we do not clear the flush bit until after we have called the next level GRO handler. Previously this was being cleared before parsing through the list of frames, however this resulted in several paths where either the bit needed to be reset but wasn't as in the case of FOU, or cases where it was being set as in GENEVE. By just deferring the clearing of the bit until after the next level protocol has been parsed we can avoid any unnecessary bit twiddling and avoid bugs. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: support setting IPv6 flow labelDaniel Borkmann2016-03-111-5/+21
| | | | | | | | | | This work adds support for setting the IPv6 flow label for vxlan per device and through collect metadata (ip_tunnel_key) frontends. The vxlan dst cache does not need any special considerations here, for the cases where caches can be used, the label is static per cache. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip_tunnel: add support for setting flow label via collect metadataDaniel Borkmann2016-03-111-1/+1
| | | | | | | | | | | This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label from call sites. Currently, there's no such option and it's always set to zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so that flow-based tunnels via collect metadata frontends can make use of it. vxlan and geneve will be converted to add flow label support separately. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: allow setting ipv6 traffic classDaniel Borkmann2016-03-081-5/+9
| | | | | | | | We can already do that for IPv4, but IPv6 support was missing. Add it for vxlan, so it can be used with collect metadata frontends. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf, vxlan, geneve, gre: fix usage of dst_cache on xmitDaniel Borkmann2016-03-081-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-03-081-3/+6
|\ | | | | | | | | | | | | | | Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan: fix missing options_len update on RX with collect metadataDaniel Borkmann2016-03-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When signalling to metadata consumers that the metadata_dst entry carries additional GBP extension data for vxlan (TUNNEL_VXLAN_OPT), the dst's vxlan_metadata information is populated, but options_len is left to zero. F.e. in ovs, ovs_flow_key_extract() checks for options_len before extracting the data through ip_tunnel_info_opts_get(). Geneve uses ip_tunnel_info_opts_set() helper in receive path, which sets options_len internally, vxlan however uses ip_tunnel_info_opts(), so when filling vxlan_metadata, we do need to update options_len. Fixes: 4c22279848c5 ("ip-tunnel: Use API to access tunnel metadata options.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.MINOURA Makoto / 箕浦 真2016-02-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <minoura@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: use reset to set header pointersZhang Shengju2016-03-041-3/+3
| | | | | | | | | | | | | | | | | | Since offset is zero, it's not necessary to use set function. Reset function is straightforward, and will remove the unnecessary add operation in set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: simplify metadata_dst usage in vxlan_rcvJiri Benc2016-02-251-12/+7
| | | | | | | | | | | | | | | | | | | | Now when the packet is scrubbed early, the metadata_dst can be assigned to the skb as soon as it is allocated. This simplifies the error cleanup path, as the dst will be freed by kfree_skb. It is also not necessary to pass it as a parameter to functions anymore. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: consolidate rx handling to a single functionJiri Benc2016-02-251-44/+28
| | | | | | | | | | | | | | | | Now when both vxlan_udp_encap_recv and vxlan_rcv are much shorter, combine them into a single function. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: move ECN decapsulation to a separate functionJiri Benc2016-02-251-31/+31
| | | | | | | | | | | | | | It simplifies the vxlan_rcv function. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: move inner L2 header processing to a separate functionJiri Benc2016-02-251-16/+33
| | | | | | | | | | | | | | | | This code will be different for VXLAN-GPE, so move it to a separate function. It will also make the rx path less spaghetti-like. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: consolidate GBP handling even moreJiri Benc2016-02-251-4/+5
| | | | | | | | | | | | | | | | Now when the packet is scrubbed early, skb->mark can be set in the GBP handling code. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-02-231-17/+39
|\ \ | |/ | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan: do not use fdb in metadata modeJiri Benc2016-02-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | In metadata mode, the vxlan interface is not supposed to use the fdb control plane but an external one (openvswitch or static routes). With the current code, packets may leak into the fdb handling code which usually causes them to be dropped anyway but may have strange side effects. Just drop the packets directly when in metadata mode if the destination data are not correctly provided on egress. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan: clear IFF_TX_SKB_SHARINGJiri Benc2016-02-181-0/+1
| | | | | | | | | | | | | | | | ether_setup sets IFF_TX_SKB_SHARING but this is not supported by vxlan as it modifies the skb on xmit. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg2016-02-101-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan: Relax MTU constraintsDavid Wragg2016-02-101-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the MTU of vxlan devices without an underlying device to be set to larger values (up to a maximum based on IP packet limits and vxlan overhead). Previously, their MTUs could not be set to higher than the conventional ethernet value of 1500. This is a very arbitrary value in the context of vxlan, and prevented vxlan devices from being able to take advantage of jumbo frames etc. The default MTU remains 1500, for compatibility. Signed-off-by: David Wragg <david@weave.works> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | VXLAN: Support outer IPv4 Tx checksums by defaultAlexander Duyck2016-02-211-10/+9
| | | | | | | | | | | | | | | | | | | | | | This change makes it so that if UDP CSUM is not specified we will default to enabling it. The main motivation behind this is the fact that with the use of outer checksum we can greatly improve the performance for VXLAN tunnels on devices that don't know how to parse tunnel headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | iptunnel: scrub packet in iptunnel_pull_headerJiri Benc2016-02-181-2/+2
| | | | | | | | | | | | | | | | Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: move vxlan device lookup before iptunnel_pull_headerJiri Benc2016-02-181-12/+11
| | | | | | | | | | | | | | This is in preparation for iptunnel_pull_header calling skb_scrub_packet. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: tun_id is 64bit, not 32bitJiri Benc2016-02-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to convert the vni to tun_id correctly. Fixes: 54bfd872bf16 ("vxlan: keep flags and vni in network byte order") Reported-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: treat vni in metadata based tunnels consistentlyJiri Benc2016-02-171-4/+4
| | | | | | | | | | | | | | | | | | | | For metadata based tunnels, VNI is ignored when doing vxlan device lookups (because such tunnel receives all VNIs). However, this was not honored by vxlan_xmit_one when doing encapsulation bypass. Move the check for metadata based tunnel to the common place where it belongs. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: clean up rx error pathJiri Benc2016-02-171-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are unrecognized flags present in the vxlan header, it doesn't make much sense to return the packet for further UDP processing, especially considering that for other invalid flag combinations we drop the packet because of previous checks. This means we return positive value only at the beginning of the function where tun_dst is not yet allocated. This allows us to get rid of the bad_flags and error jump labels. When we're dropping packet, we need to free tun_dst now. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: clean up extension handling on rxJiri Benc2016-02-171-30/+32
| | | | | | | | | | | | | | | | Bring the extension handling to a single place and move the actual handling logic out of vxlan_udp_encap_recv as much as possible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: move GBP header parsing to a separate functionJiri Benc2016-02-171-14/+19
| | | | | | | | | | | | | | To make vxlan_udp_encap_recv shorter and more comprehensible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: simplify vxlan_remcsumJiri Benc2016-02-171-14/+8
| | | | | | | | | | | | | | | | Part of the parameters is not needed. Simplify the caller of this function in preparation of making vxlan rx more comprehensible. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: keep flags and vni in network byte orderJiri Benc2016-02-171-58/+57
| | | | | | | | | | | | | | | | | | | | | | | | Prevent repeated conversions from and to network order in the fast path. To achieve this, define all flag constants in big endian order and store VNI as __be32. To prevent confusion between the actual VNI value and the VNI field from the header (which contains additional reserved byte), strictly distinguish between "vni" and "vni_field". Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: introduce vxlan_hdrJiri Benc2016-02-171-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | Currently, pointer to the vxlan header is kept in a local variable. It has to be reloaded whenever the pskb pull operations are performed which usually happens somewhere deep in called functions. Create a vxlan_hdr function and use it to reference the vxlan header instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: add dst_cache to ovs vxlan lwtunnelPaolo Abeni2016-02-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | In case of UDP traffic with datagram length below MTU this give about 2% performance increase when tunneling over ipv4 and about 60% when tunneling over ipv6 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: use dst_cache for vxlan devicePaolo Abeni2016-02-161-8/+47
| | | | | | | | | | | | | | | | | | | | | | In case of UDP traffic with datagram length below MTU this give about 3% performance increase when tunneling over ipv4 and about 70% when tunneling over ipv6. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloadsEdward Cree2016-02-121-1/+1
| | | | | | | | | | | | | | | | All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: vxlan: enable local checksum offloadEdward Cree2016-02-121-4/+2
| | | | | | | | | | Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: udp_tunnel duplicate include net/udp_tunnel.hstephen hemminger2016-02-111-1/+1
| | | | | | | | | | Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: consolidate vxlan_xmit_skb and vxlan6_xmit_skbJiri Benc2016-02-071-111/+26
| | | | | | | | | | | | | | | | There's a lot of code duplication. Factor out the duplicate code to a new function shared between IPv4 and IPv6 xmit path. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: consolidate csum flag handlingJiri Benc2016-02-071-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The flag for tx checksumming for tunneling over IPv4 and IPv6 is different. Decide whether to do tx checksumming in vxlan_xmit_one and pass it on as a separate flag. This will allow for tx path consolidation in the next patch. Unfortunately, gcc is not clever enough to see that udp_sum is always initialized and gives an uninitialized variable warning. Set it to false to silence the warning. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vxlan: consolidate output route calculationJiri Benc2016-02-071-40/+37
|/ | | | | | | | The code for output route lookup is duplicated for ndo_start_xmit and ndo_fill_metadata_dst. Move it to a common function. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: fix a out of bounds access in __vxlan_find_macLi RongQing2016-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of all_zeros_mac is 6 byte, but eth_hash() will access the 8 byte, and KASan reported the below bug: [ 8596.479031] BUG: KASan: out of bounds access in __vxlan_find_mac+0x24/0x100 at addr ffffffff841514c0 [ 8596.487647] Read of size 8 by task ip/52820 [ 8596.490818] Address belongs to variable all_zeros_mac+0x0/0x40 [ 8596.496051] CPU: 0 PID: 52820 Comm: ip Tainted: G WC 4.1.15 #1 [ 8596.503520] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 [ 8596.509365] ffffffff841514c0 ffff88007450f0b8 ffffffff822fa5e1 0000000000000032 [ 8596.516112] ffff88007450f150 ffff88007450f138 ffffffff812dd58c ffff88007450f1d8 [ 8596.522856] ffffffff81113b80 0000000000000282 0000000000000001 ffffffff8101ee4d [ 8596.529599] Call Trace: [ 8596.530858] [<ffffffff822fa5e1>] dump_stack+0x4f/0x7b [ 8596.535080] [<ffffffff812dd58c>] kasan_report_error+0x3bc/0x3f0 [ 8596.540258] [<ffffffff81113b80>] ? __lock_acquire+0x90/0x2140 [ 8596.545245] [<ffffffff8101ee4d>] ? save_stack_trace+0x2d/0x80 [ 8596.550234] [<ffffffff812dda70>] kasan_report+0x40/0x50 [ 8596.554647] [<ffffffff81b211e4>] ? __vxlan_find_mac+0x24/0x100 [ 8596.559729] [<ffffffff812dc399>] __asan_load8+0x69/0xa0 [ 8596.564141] [<ffffffff81b211e4>] __vxlan_find_mac+0x24/0x100 [ 8596.569033] [<ffffffff81b2683d>] vxlan_fdb_create+0x9d/0x570 it can be fixed by enlarging the all_zeros_mac to 8 byte, although it is harmless; eth_hash() will be called in other place with the memory which is larger and equal to 8 byte. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tunnels: Allow IPv6 UDP checksums to be correctly controlled.Jesse Gross2016-01-211-7/+16
| | | | | | | | | | | | | | When configuring checksums on UDP tunnels, the flags are different for IPv4 vs. IPv6 (and reversed). However, when lightweight tunnels are enabled the flags used are always the IPv4 versions, which are ignored in the IPv6 code paths. This uses the correct IPv6 flags, so checksums can be controlled appropriately. Fixes: a725e514 ("vxlan: metadata based tunneling for IPv6") Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink") Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-01-111-4/+10
|\ | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/bonding/bond_main.c drivers/net/ethernet/mellanox/mlxsw/spectrum.h drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c The bond_main.c and mellanox switch conflicts were cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * udp: restrict offloads to one namespaceHannes Frederic Sowa2016-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udp tunnel offloads tend to aggregate datagrams based on inner headers. gro engine gets notified by tunnel implementations about possible offloads. The match is solely based on the port number. Imagine a tunnel bound to port 53, the offloading will look into all DNS packets and tries to aggregate them based on the inner data found within. This could lead to data corruption and malformed DNS packets. While this patch minimizes the problem and helps an administrator to find the issue by querying ip tunnel/fou, a better way would be to match on the specific destination ip address so if a user space socket is bound to the same address it will conflict. Cc: Tom Herbert <tom@herbertland.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vxlan: fix test which detect duplicate vxlan ifaceNicolas Dichtel2016-01-091-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a vxlan interface is created, the driver checks that there is not another vxlan interface with the same properties. To do this, it checks the existing vxlan udp socket. Since commit 1c51a9159dde, the creation of the vxlan socket is done only when the interface is set up, thus it breaks that test. Example: $ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 $ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 $ ip -br l | grep vxlan vxlan10 DOWN f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST> vxlan11 DOWN 7a:cb:b9:38:59:0d <BROADCAST,MULTICAST> Instead of checking sockets, let's loop over the vxlan iface list. Fixes: 1c51a9159dde ("vxlan: fix race caused by dropping rtnl_unlock") Reported-by: Thomas Faivre <thomas.faivre@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip_tunnel: Move stats update to iptunnel_xmit()Pravin B Shelar2015-12-251-5/+4
|/ | | | | | | | | | By moving stats update into iptunnel_xmit(), we can simplify iptunnel_xmit() usage. With this change there is no need to call another function (iptunnel_xmit_stats()) to update stats in tunnel xmit code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: interpret IP headers for ECN correctlyJiri Benc2015-12-071-4/+2
| | | | | | | | | | | | When looking for outer IP header, use the actual socket address family, not the address family of the default destination which is not set for metadata based interfaces (and doesn't have to match the address family of the received packet even if it was set). Fix also the misleading comment. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: support ndo_fill_metadata_dst also for IPv6Jiri Benc2015-12-071-2/+23
| | | | | | | | | Fill the metadata correctly even when tunneling over IPv6. Also, check that the provided metadata is of an address family that is supported by the tunnel. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vxlan: move IPv6 outpute route calculation to a functionJiri Benc2015-12-071-10/+34
| | | | | | | Will be used also for ndo_fill_metadata_dst. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-10-241-0/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud