summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* RDS: verify the underlying transport exists before creating a connectionSasha Levin2015-09-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was no verification that an underlying transport exists when creating a connection, this would cause dereferencing a NULL ptr. It might happen on sockets that weren't properly bound before attempting to send a message, which will cause a NULL ptr deref: [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [135546.051270] Modules linked in: [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527 [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000 [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194) [135546.055666] RSP: 0018:ffff8800bc70fab0 EFLAGS: 00010202 [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000 [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038 [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000 [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000 [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000 [135546.061668] FS: 00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000 [135546.062836] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0 [135546.064723] Stack: [135546.065048] ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008 [135546.066247] 0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342 [135546.067438] 1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00 [135546.068629] Call Trace: [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134) [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298) [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278) [135546.071981] rds_sendmsg (net/rds/send.c:1058) [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38) [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298) [135546.074577] ? rds_send_drop_to (net/rds/send.c:976) [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795) [135546.076349] ? __might_fault (mm/memory.c:3795) [135546.077179] ? rds_send_drop_to (net/rds/send.c:976) [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620) [135546.078856] SYSC_sendto (net/socket.c:1657) [135546.079596] ? SYSC_connect (net/socket.c:1628) [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926) [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674) [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749) [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16) [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16) [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749) [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1 Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tipc: fix stall during bclink wakeup procedureKolmakov Dmitriy2015-09-081-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an attempt to wake up users of broadcast link is made when there is no enough place in send queue than it may hang up inside the tipc_sk_rcv() function since the loop breaks only after the wake up queue becomes empty. This can lead to complete CPU stall with the following message generated by RCU: INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies g=54225 c=54224 q=11465) Task dump for CPU 0: tpch R running task 0 39949 39948 0x0000000a ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000 ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800 0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680 Call Trace: <IRQ> [<ffffffff8106a4be>] sched_show_task+0xae/0x120 [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40 [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0 [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0 [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170 [<ffffffff81099e64>] update_process_times+0x34/0x60 [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40 [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70 [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0 [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0 [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60 [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60 [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60 [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70 [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10 [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60 [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc] [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc] [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc] [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30 [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc] [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc] [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc] [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40 [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc] [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc] [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950 [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60 [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90 [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0 [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3] [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40 [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3] [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290 [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0 [<ffffffff8104bc26>] irq_exit+0x76/0xa0 [<ffffffff81004355>] do_IRQ+0x55/0xf0 [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b <EOI> The issue occurs only when tipc_sk_rcv() is used to wake up postponed senders: tipc_bclink_wakeup_users() // wakeupq - is a queue which consists of special // messages with SOCK_WAKEUP type. tipc_sk_rcv(wakeupq) ... while (skb_queue_len(inputq)) { filter_rcv(skb) // Here the type of message is checked // and if it is SOCK_WAKEUP then // it tries to wake up a sender. tipc_write_space(sk) wake_up_interruptible_sync_poll() } After the sender thread is woke up it can gather control and perform an attempt to send a message. But if there is no enough place in send queue it will call link_schedule_user() function which puts a message of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep. Thus the size of the queue actually is not changed and the while() loop never exits. The approach I proposed is to wake up only senders for which there is enough place in send queue so the described issue can't occur. Moreover the same approach is already used to wake up senders on unicast links. I have got into the issue on our product code but to reproduce the issue I changed a benchmark test application (from tipcutils/demos/benchmark) to perform the following scenario: 1. Run 64 instances of test application (nodes). It can be done on the one physical machine. 2. Each application connects to all other using TIPC sockets in RDM mode. 3. When setup is done all nodes start simultaneously send broadcast messages. 4. Everything hangs up. The issue is reproducible only when a congestion on broadcast link occurs. For example, when there are only 8 nodes it works fine since congestion doesn't occur. Send queue limit is 40 in my case (I use a critical importance level) and when 64 nodes send a message at the same moment a congestion occurs every time. Signed-off-by: Dmitry S Kolmakov <kolmakov.dmitriy@huawei.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: remove unnecessary switchdev includeVivien Didelot2015-09-081-1/+0
| | | | | | | | Remove the unnecessary switchdev.h include from br_netlink.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: bridge: check __vlan_vid_del for errorVivien Didelot2015-09-081-4/+13
| | | | | | | | | Since __vlan_del can return an error code, change its inner function __vlan_vid_del to return an eventual error from switchdev_port_obj_del. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* openvswitch: Remove conntrack Kconfig option.Joe Stringer2015-09-063-14/+6
| | | | | | | | | | | | | There's no particular desire to have conntrack action support in Open vSwitch as an independently configurable bit, rather just to ensure there is not a hard dependency. This exposed option doesn't accurately reflect the conntrack dependency when enabled, so simplify this by removing the option. Compile the support if NF_CONNTRACK is enabled. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'mac80211-for-davem-2015-09-04' of ↵David S. Miller2015-09-067-6/+113
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== For the first round of fixes, we have this: * fix for the sizeof() pointer type issue * a fix for regulatory getting into a restore loop * a fix for rfkill global 'all' state, it needs to be stored everywhere to apply correctly to new rfkill instances * properly refuse CQM RSSI when it cannot actually be used * protect HT TDLS traffic properly in non-HT networks * don't incorrectly advertise 80 MHz support when not allowed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mac80211: reject software RSSI CQM with beacon filteringJohannes Berg2015-09-041-0/+4
| | | | | | | | | | | | | | | | When beacon filtering is enabled the mac80211 software implementation for RSSI CQM cannot work as beacons will not be available. Rather than accepting such a configuration without proper effect, reject it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * mac80211: avoid VHT usage with no 80MHz chans allowedArik Nemtsov2015-09-042-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently if 80MHz channels are not allowed for use, the VHT IE is not included in the probe request for an AP. This is not good enough if the AP is configured with the wrong regulatory and supports VHT even where prohibited or in TDLS scenarios. Mark the ifmgd with the DISABLE_VHT flag for the misbehaving-AP case, and unset VHT support from the peer-station entry for the TDLS case. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * cfg80211: regulatory: restore proper user alpha2Maciej S. Szmigiero2015-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | restore_regulatory_settings() should restore alpha2 as computed in restore_alpha2(), not raw user_alpha2 to behave as described in the comment just above that code. This fixes endless loop of calling CRDA for "00" and "97" countries after resume from suspend on my laptop. Looks like others had the same problem, too: http://ath9k-devel.ath9k.narkive.com/knY5W6St/ath9k-and-crda-messages-in-logs https://bugs.launchpad.net/ubuntu/+source/linux/+bug/899335 https://forum.porteus.org/viewtopic.php?t=4975&p=36436 https://forums.opensuse.org/showthread.php/483356-Authentication-Regulatory-Domain-issues-ath5k-12-2 Signed-off-by: Maciej Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * rfkill: Copy "all" global state to other typesJoão Paulo Rechi Vita2015-09-041-1/+9
| | | | | | | | | | | | | | | | | | | | | | When switching the state of all RFKill switches of type all we need to replicate the RFKILL_TYPE_ALL global state to all the other types global state, so it is used to initialize persistent RFKill switches on register. Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * mac80211: protect non-HT BSS when HT TDLS traffic existsAvri Altman2015-09-041-3/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HT TDLS traffic should be protected in a non-HT BSS to avoid collisions. Therefore, when TDLS peers join/leave, check if protection is (now) needed and set the ht_operation_mode of the virtual interface according to the HT capabilities of the TDLS peer(s). This works because a non-HT BSS connection never sets (or otherwise uses) the ht_operation_mode; it just means that drivers must be aware that this field applies to all HT traffic for this virtual interface, not just the traffic within the BSS. Document that. Signed-off-by: Avri Altman <avri.altman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * mac80211: Do not use sizeof() on pointer typeThierry Reding2015-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rate_control_cap_mask() function takes a parameter mcs_mask, which GCC will take to be u8 * even though it was declared with a fixed size. This causes the following warning: net/mac80211/rate.c: In function 'rate_control_cap_mask': net/mac80211/rate.c:719:25: warning: 'sizeof' on array function parameter 'mcs_mask' will return size of 'u8 * {aka unsigned char *}' [-Wsizeof-array-argument] for (i = 0; i < sizeof(mcs_mask); i++) ^ net/mac80211/rate.c:684:10: note: declared here u8 mcs_mask[IEEE80211_HT_MCS_MASK_LEN], ^ This can be easily fixed by using the IEEE80211_HT_MCS_MASK_LEN directly within the loop condition. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | RDS: rds_conn_lookup() should factor in the struct net for a matchSowmini Varadhan2015-09-051-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | Only return a conn if the rds_conn_net(conn) matches the struct net passed to rds_conn_lookup(). Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | switchdev: fix return value of switchdev_port_fdb_dump in case of errorJiri Pirko2015-09-051-5/+1
| | | | | | | | | | | | | | | | | | | | | | switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is idx, so we cannot return errval. Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: Scott Feldman<sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2015-09-057-12/+55
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/net/netfilter/nf_conntrack.h The conflict was an overlap between changing the type of the zone argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free. Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Oneliner to restore maps in nf_tables since we support addressing registers at 32 bits level. 2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n, oneliner from Bernhard Thaler. 3) Out of bound access in ipset hash:net* set types, reported by Dave Jones' KASan utility, patch from Jozsef Kadlecsik. 4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of unnamed unions, patch from Elad Raz. 5) Add a workaround to address inconsistent endianess in the res_id field of nfnetlink batch messages, reported by Florian Westphal. 6) Fix error paths of CT/synproxy since the conntrack template was moved to use kmalloc, patch from Daniel Borkmann. All of them look good to me to reach 4.2, I can route this to -stable myself too, just let me know what you prefer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error pathsDaniel Borkmann2015-09-013-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nfnetlink: work around wrong endianess in res_id fieldPablo Neira Ayuso2015-08-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The convention in nfnetlink is to use network byte order in every header field as well as in the attribute payload. The initial version of the batching infrastructure assumes that res_id comes in host byte order though. The only client of the batching infrastructure is nf_tables, so let's add a workaround to address this inconsistency. We currently have 11 nfnetlink subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem 2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias of nf_tables from the nfnetlink batching path when interpreting the res_id field. Based on original patch from Florian Westphal. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: ipset: Fixing unnamed union initElad Raz2015-08-292-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In continue to proposed Vinson Lee's post [1], this patch fixes compilation issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed unions causes compilation error in gcc 4.4.x. References Visible links [1] https://lkml.org/lkml/2015/7/5/74 Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: ipset: Out of bound access in hash:net* types fixedJozsef Kadlecsik2015-08-281-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Jones reported that KASan detected out of bounds access in hash:net* types: [ 23.139532] ================================================================== [ 23.146130] BUG: KASan: out of bounds access in hash_net4_add_cidr+0x1db/0x220 at addr ffff8800d4844b58 [ 23.152937] Write of size 4 by task ipset/457 [ 23.159742] ============================================================================= [ 23.166672] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 23.173641] ----------------------------------------------------------------------------- [ 23.194668] INFO: Allocated in hash_net_create+0x16a/0x470 age=7 cpu=1 pid=456 [ 23.201836] __slab_alloc.constprop.66+0x554/0x620 [ 23.208994] __kmalloc+0x2f2/0x360 [ 23.216105] hash_net_create+0x16a/0x470 [ 23.223238] ip_set_create+0x3e6/0x740 [ 23.230343] nfnetlink_rcv_msg+0x599/0x640 [ 23.237454] netlink_rcv_skb+0x14f/0x190 [ 23.244533] nfnetlink_rcv+0x3f6/0x790 [ 23.251579] netlink_unicast+0x272/0x390 [ 23.258573] netlink_sendmsg+0x5a1/0xa50 [ 23.265485] SYSC_sendto+0x1da/0x2c0 [ 23.272364] SyS_sendto+0xe/0x10 [ 23.279168] entry_SYSCALL_64_fastpath+0x12/0x6f The bug is fixed in the patch and the testsuite is extended in ipset to check cidr handling more thoroughly. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
* | | net/ipv6: Correct PIM6 mrt_lock handlingRichard Laing2015-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the IPv6 multicast routing code the mrt_lock was not being released correctly in the MFC iterator, as a result adding or deleting a MIF would cause a hang because the mrt_lock could not be acquired. This fix is a copy of the code for the IPv4 case and ensures that the lock is released correctly. Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sctp: add routing output fallbackMarcelo Ricardo Leitner2015-09-031-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0ca50d12fe46 added a restriction that the address must belong to the output interface, so that sctp will use the right interface even when using secondary addresses. But it breaks IPVS setups, on which people is used to attach VIP addresses to loopback interface on real servers. It's preferred to attach to the interface actually in use, but it's a very common setup and that used to work. This patch then saves the first routing good result, even if it would be going out through an interface that doesn't have that address. If no better hit found, it's then used. This effectively restores the original behavior if no better interface could be found. Fixes: 0ca50d12fe46 ("sctp: fix src address selection if using secondary addresses") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sctp: fix dst leakMarcelo Ricardo Leitner2015-09-031-1/+3
| |/ |/| | | | | | | | | | | | | | | Commit 0ca50d12fe46 failed to release the reference to dst entries that it decided to skip. Fixes: 0ca50d12fe46 ("sctp: fix src address selection if using secondary addresses") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2015-09-03411-7873/+14465
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Another merge window, another set of networking changes. I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know? 1) Add conntrack support to openvswitch, from Joe Stringer. 2) Initial support for VRF (Virtual Routing and Forwarding), which allows the segmentation of routing paths without using multiple devices. There are some semantic kinks to work out still, but this is a reasonably strong foundation. From David Ahern. 3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov. 4) Ignore route nexthops with a link down state in ipv6, just like ipv4. From Andy Gospodarek. 5) Remove spinlock from fast path of act_gact and act_mirred, from Eric Dumazet. 6) Document the DSA layer, from Florian Fainelli. 7) Add netconsole support to bcmgenet, systemport, and DSA. Also from Florian Fainelli. 8) Add Mellanox Switch Driver and core infrastructure, from Jiri Pirko. 9) Add support for "light weight tunnels", which allow for encapsulation and decapsulation without bearing the overhead of a full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of others. 10) Add Identifier Locator Addressing support for ipv6, from Tom Herbert. 11) Support fragmented SKBs in iwlwifi, from Johannes Berg. 12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia. 13) Add BQL support to 3c59x driver, from Loganaden Velvindron. 14) Stop using a zero TX queue length to mean that a device shouldn't have a qdisc attached, use an explicit flag instead. From Phil Sutter. 15) Use generic geneve netdevice infrastructure in openvswitch, from Pravin B Shelar. 16) Add infrastructure to avoid re-forwarding a packet in software that was already forwarded by a hardware switch. From Scott Feldman. 17) Allow AF_PACKET fanout function to be implemented in a bpf program, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits) netfilter: nf_conntrack: make nf_ct_zone_dflt built-in netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled net: fec: clear receive interrupts before processing a packet ipv6: fix exthdrs offload registration in out_rt path xen-netback: add support for multicast control bgmac: Update fixed_phy_register() sock, diag: fix panic in sock_diag_put_filterinfo flow_dissector: Use 'const' where possible. flow_dissector: Fix function argument ordering dependency ixgbe: Resolve "initialized field overwritten" warnings ixgbe: Remove bimodal SR-IOV disabling ixgbe: Add support for reporting 2.5G link speed ixgbe: fix bounds checking in ixgbe_setup_tc for 82598 ixgbe: support for ethtool set_rxfh ixgbe: Avoid needless PHY access on copper phys ixgbe: cleanup to use cached mask value ixgbe: Remove second instance of lan_id variable ixgbe: use kzalloc for allocating one thing flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c ixgbe: Remove unused PCI bus types ...
| * | netfilter: nf_conntrack: make nf_ct_zone_dflt built-inDaniel Borkmann2015-09-022-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: 308ac9143ee2 ("netfilter: nf_conntrack: push zone object into functions") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabledDaniel Borkmann2015-09-022-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing various Kconfig options on another issue, I found that the following one triggers as well on allmodconfig and nf_conntrack disabled: net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’: net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function) if (this_cpu_read(nf_skb_duplicated)) [...] net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’: net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function) if (this_cpu_read(nf_skb_duplicated)) Fix it by including directly the header where it is defined. Fixes: bbde9fc1824a ("netfilter: factor out packet duplication for IPv4/IPv6") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: fix exthdrs offload registration in out_rt pathDaniel Borkmann2015-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We previously register IPPROTO_ROUTING offload under inet6_add_offload(), but in error path, we try to unregister it with inet_del_offload(). This doesn't seem correct, it should actually be inet6_del_offload(), also ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it also uses rthdr_offload twice), but it got removed entirely later on. Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sock, diag: fix panic in sock_diag_put_filterinfoDaniel Borkmann2015-09-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | diag socket's sock_diag_put_filterinfo() dumps classic BPF programs upon request to user space (ss -0 -b). However, native eBPF programs attached to sockets (SO_ATTACH_BPF) cannot be dumped with this method: Their orig_prog is always NULL. However, sock_diag_put_filterinfo() unconditionally tries to access its filter length resp. wants to copy the filter insns from there. Internal cBPF to eBPF transformations attached to sockets don't have this issue, as orig_prog state is kept. It's currently only used by packet sockets. If we would want to add native eBPF support in the future, this needs to be done through a different attribute than PACKET_DIAG_FILTER to not confuse possible user space disassemblers that work on diag data. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Use 'const' where possible.David S. Miller2015-09-011-38/+41
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow: Move __get_hash_from_flowi{4,6} into flow_dissector.cDavid S. Miller2015-09-012-36/+35
| | | | | | | | | | | | | | | | | | | | | | | | These cannot live in net/core/flow.c which only builds when XFRM is enabled. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Don't use bit fields.David S. Miller2015-09-011-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just have a flags member instead. In file included from include/linux/linkage.h:4:0, from include/linux/kernel.h:6, from net/core/flow_dissector.c:1: In function 'flow_keys_hash_start', inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:553:34: >> include/linux/compiler.h:447:38: error: call to '__compiletime_assert_459' declared with attribute error: BUILD_BUG_ON failed: FLOW_KEYS_HASH_OFFSET % sizeof(u32) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Ignore flow dissector return value from ___skb_get_hashTom Herbert2015-09-011-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ___skb_get_hash ignore return value from skb_flow_dissect_flow_keys. A failure in that function likely means that there was a parse error, so we may as well use whatever fields were found before the error was hit. This is also good because it means we won't keep trying to derive the hash on subsequent calls to skb_get_hash for the same packet. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Add control/reporting of encapsulationTom Herbert2015-09-011-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an input flag to flow dissector on rather dissection should stop when encapsulation is detected (IP/IP or GRE). Also, add a key_control flag that indicates encapsulation was encountered during the dissection. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Add flag to stop parsing when an IPv6 flow label is seenTom Herbert2015-09-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an input flag to flow dissector on rather dissection should be stopped when a flow label is encountered. Presumably, the flow label is derived from a sufficient hash of an inner transport packet so further dissection is not needed (that is ports are not included in the flow hash). Using the flow label instead of ports has the additional benefit that packet fragments should hash to same value as non-fragments for a flow (assuming that the same flow label is used). We set this flag by default in for skb_get_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Add flag to stop parsing at L3Tom Herbert2015-09-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an input flag to flow dissector on rather dissection should be stopped when an L3 packet is encountered. This would be useful if a caller just wanted to get IP addresses of the outermost header (e.g. to do an L3 hash). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Support IPv6 fragment headerTom Herbert2015-09-011-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | Parse NEXTHDR_FRAGMENT. When seen account for it in the fragment bits of key_control. Also, check if first fragment should be parsed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Add control/reporting of fragmentationTom Herbert2015-09-011-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add an input flag to flow dissector on rather dissection should be attempted on a first fragment. Also add key_control flags to indicate that a packet is a fragment or first fragment. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Add flags argument to skb_flow_dissector functionsTom Herbert2015-09-015-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | The flags argument will allow control of the dissection process (for instance whether to parse beyond L3). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flow_dissector: Jump to exit code in __skb_flow_dissectTom Herbert2015-09-011-26/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of returning immediately (on a parsing failure for instance) we jump to cleanup code. This always sets protocol values in key_control (even on a failure there is still valid information in the key_tags that was set before the problem was hit). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | flowi: Abstract out functions to get flow hash based on flowiTom Herbert2015-09-011-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the flow keys and hash based on flowi structures. These are called by __skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created get_hash_from_flowi6 and get_hash_from_flowi4 which can be called when just the hash value for a flowi is needed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | skbuff: Make __skb_set_sw_hash a general functionTom Herbert2015-09-011-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is a common method (between __skb_set_sw_hash and skb_set_hash) to set the hash in an skbuff. Also, move skb_clear_hash to be closer to __skb_set_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Make table id type u32David Ahern2015-09-013-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of VRF patches used 'int' for table id. It should be u32 to be consistent with the rest of the stack. Fixes: 4e3c89920cd3a ("net: Introduce VRF related flags and helpers") 15be405eb2ea9 ("net: Add inet_addr lookup by table") 30bbaa1950055 ("net: Fix up inet_addr_type checks") 021dd3b8a142d ("net: Add routes to the table associated with the device") dc028da54ed35 ("inet: Move VRF table lookup to inlined function") f6d3c19274c74 ("net: FIB tracepoints") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tun_dst: Remove opts_sizePravin B Shelar2015-08-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | opts_size is only written and never read. Following patch removes this unused variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: send only one NEWLINK when RA causes changesMarius Tomaschewski2015-08-311-3/+10
| | | | | | | | | | | | | | | Signed-off-by: Marius Tomaschewski <mt@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: send NEWLINK on RA managed/otherconf changesMarius Tomaschewski2015-08-311-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel is applying the RA managed/otherconf flags silently and forgets to send ifinfo notify to inform about their change when the router provides a zero reachable_time and retrans_timer as dnsmasq and many routers send it, which just means unspecified by this router and the host should continue using whatever value it is already using. Userspace may monitor the ifinfo notifications to activate dhcpv6. Signed-off-by: Marius Tomaschewski <mt@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: Allow DSA and CPU ports to have a phy-mode propertyAndrew Lunn2015-08-311-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can be useful for DSA and CPU ports to have a phy-mode property, in particular to specify RGMII delays. Parse the property and set it in the fixed-link phydev. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: Allow configuration of CPU & DSA port speeds/duplexAndrew Lunn2015-08-311-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer devices port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mpls: fix mpls_net_init memory leakNikolay Aleksandrov2015-08-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a memory leak in the mpls netns init function in case of failure. If register_net_sysctl fails then we need to free the ctl_table. Fixes: 7720c01f3f59 ("mpls: Add a sysctl to control the size of the mpls label table") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: use dctcp if enabled on the route to the initiatorDaniel Borkmann2015-08-315-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | fib, fib6: reject invalid feature bitsDaniel Borkmann2015-08-312-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fib6: reduce identation in ip6_convert_metricsDaniel Borkmann2015-08-311-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | Reduce the identation a bit, there's no need to artificically have it increased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud