summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* xfrm: dst_entries_init() per-net dst_opsDan Streetman2015-11-033-62/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
* net: phy: fix a bug in get_phy_c45_idsShaohui Xie2015-11-021-25/+49
| | | | | | | | | | | | | | | | | | When probing devices-in-package for a c45 phy, device zero is the last device to probe, however, if driver reads 0 from device zero, c45_ids->devices_in_package is set to '0', the loop condition of probing will be matched again, see codes below: for (i = 1;i < num_ids && c45_ids->devices_in_package == 0;i++) driver will run in a dead loop. This patch restructures the bug and confusing loop, it provides a helper function get_phy_c45_devs_in_pkg which to read devices-in-package registers of a MMD, and rewrites the loop with using the helper function. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sfc: push partner queue for skb->xmit_moreMartin Habets2015-11-024-4/+36
| | | | | | | | | | | | | | | | | When the IP stack passes SKBs the sfc driver puts them in 2 different TX queues (called partners), one for checksummed and one for not checksummed. If the SKB has xmit_more set the driver will delay pushing the work to the NIC. When later it does decide to push the buffers this patch ensures it also pushes the partner queue, if that also has any delayed work. Before this fix the work in the partner queue would be left for a long time and cause a netdev watchdog. Fixes: 70b33fb ("sfc: add support for skb->xmit_more") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sit: fix sit0 percpu double allocationsEric Dumazet2015-11-021-22/+4
| | | | | | | | | | | | | | | | | | | | sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid NULL deref in inet_ctl_sock_destroy()Eric Dumazet2015-11-021-1/+2
| | | | | | | | | | Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed sourceMatthias Schiffer2015-11-021-2/+1
| | | | | | | | | | | | | There are other error values besides ip6_null_entry that can be returned by ip6_route_redirect(): fib6_rule_action() can also result in ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed. Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt() to be called with rt->rt6i_table == NULL in these cases, making the kernel crash. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ppp, slip: Validate VJ compression slot parameters completelyBen Hutchings2015-11-024-15/+15
| | | | | | | | | | | | | | | | Currently slhc_init() treats out-of-range values of rslots and tslots as equivalent to 0, except that if tslots is too large it will dereference a null pointer (CVE-2015-7799). Add a range-check at the top of the function and make it return an ERR_PTR() on error instead of NULL. Change the callers accordingly. Compile-tested only. Reported-by: 郭永刚 <guoyonggang@360.cn> References: http://article.gmane.org/gmane.comp.security.oss.general/17908 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* isdn_ppp: Add checks for allocation failure in isdn_ppp_open()Ben Hutchings2015-11-021-0/+6
| | | | | | | Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* qmi_wwan: fix entry for HP lt4112 LTE/HSPA+ Gobi 4G ModuleBjørn Mork2015-11-021-1/+4
| | | | | | | | | | | | | | | | | | | | The lt4112 is a HP branded Huawei me906e modem. Like other Huawei modems, it does not have a fixed interface to function mapping. Instead it uses a Huawei specific scheme: functions are mapped by subclass and protocol. However, the HP vendor ID is used for modems from many different manufacturers using different schemes, so we cannot apply a generic vendor rule like we do for the Huawei vendor ID. Replace the previous lt4112 entry pointing to an arbitrary interface number with a device specific subclass + protocol match. Reported-and-tested-by: Muri Nicanor <muri+libqmi@immerda.ch> Tested-by: Martin Hauke <mardnh@gmx.de> Fixes: bb2bdeb83fb1 ("qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() ↵Ani Sinha2015-11-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in preemptible context. Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'sh_eth-fixes'David S. Miller2015-11-021-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | Sergei Shtylyov says: ==================== sh_eth: fix bugs in sh_eth_ring_init() Here's a set of 2 patches against DaveM's 'net.git' repo which fix couple of bugs in the sh_eth_ring_init() function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * sh_eth: fix WARNING in dma_common_free_remap()Sergei Shtylyov2015-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Iff the first dma_alloc_coherent() call fails in sh_eth_ring_init(), the following is printed to the kernel console: WARNING: CPU: 0 PID: 1 at drivers/base/dma-mapping.c:334 dma_common_free_remap+0x48/0x6c() trying to free invalid coherent area: (null) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc7-dirty #969 Hardware name: Generic R8A7791 (Flattened Device Tree) Backtrace: [<c0013820>] (dump_backtrace) from [<c00139bc>] (show_stack+0x18/0x1c) r6:c0662856 r5:00000009 r4:00000000 r3:00204140 [<c00139a4>] (show_stack) from [<c0227510>] (dump_stack+0x74/0x90) [<c022749c>] (dump_stack) from [<c0026ef4>] (warn_slowpath_common+0x8c/0xb8) r4:ee84dce0 r3:c0712774 [<c0026e68>] (warn_slowpath_common) from [<c0026fc4>] (warn_slowpath_fmt+0x38/0x40) r8:ee7f8000 r7:c0734520 r6:00001000 r5:20000008 r4:00000000 [<c0026f90>] (warn_slowpath_fmt) from [<c02df404>] (dma_common_free_remap+0x48/0x6c) r3:00000000 r2:c0662871 [<c02df3bc>] (dma_common_free_remap) from [<c001b9fc>] (__arm_dma_free+0xb8/0xd4) r6:00000001 r5:00000000 r4:00001000 r3:ee8c5584 [<c001b944>] (__arm_dma_free) from [<c001ba68>] (arm_dma_free+0x24/0x2c) r10:0000016b r8:00000000 r7:ee9bc830 r6:00000000 r5:00000400 r4:ee9bc800 [<c001ba44>] (arm_dma_free) from [<c032ebf0>] (sh_eth_ring_init+0x110/0x138) [<c032eae0>] (sh_eth_ring_init) from [<c033179c>] (sh_eth_open+0x94/0x1f4) r6:00000000 r5:ee9bcd18 r4:ee9bc800 [<c0331708>] (sh_eth_open) from [<c041bf7c>] (__dev_open+0x84/0x104) r6:c0565c50 r5:00000000 r4:ee9bc800 [<c041bef8>] (__dev_open) from [<c041c208>] (__dev_change_flags+0x94/0x13c) r7:00001002 r6:00000001 r5:00001003 r4:ee9bc800 [<c041c174>] (__dev_change_flags) from [<c041c2e8>] (dev_change_flags+0x20/0x50) r7:c072c8a0 r6:00000138 r5:00001002 r4:ee9bc800 [<c041c2c8>] (dev_change_flags) from [<c06e8d4c>] (ip_auto_config+0x174/0xf7c) r8:00001002 r7:c072c8a0 r6:c0700040 r5:00000001 r4:ee9bc800 r3:00000101 [<c06e8bd8>] (ip_auto_config) from [<c000a810>] (do_one_initcall+0x100/0x1c8) r10:c06f883c r9:00000000 r8:c06e8bd8 r7:c0734000 r6:c070e918 r5:c070e918 r4:ee083640 [<c000a710>] (do_one_initcall) from [<c06c9ddc>] (kernel_init_freeable+0x11c/0x1ec) r10:c06f883c r9:00000000 r8:00000099 r7:c0734000 r6:c070372c r5:c06f8834 r4:00000007 [<c06c9cc0>] (kernel_init_freeable) from [<c0514d78>] (kernel_init+0x14/0xec) r10:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0514d64 r4:c0734000 [<c0514d64>] (kernel_init) from [<c0010458>] (ret_from_fork+0x14/0x3c) r4:00000000 r3:ee84c000 This is because the code jumps to a wrong label and so tries to free yet unallocated coherent memory. Fix the *goto* in question. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sh_eth: fix uninitialized arrays in sh_eth_ring_init()Sergei Shtylyov2015-11-021-4/+4
|/ | | | | | | | | | sh_eth_ring_free() called in the sh_eth_ring_init()'s error path expects the arrays pointed to by 'sh_eth_private::[rt]x_skbuff' to be initialized with NULLs but they are allocated with just kmalloc_array() and so are left filled with random data. Use kcalloc() instead. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'linux-can-fixes-for-4.3-20151030' of ↵David S. Miller2015-11-021-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-10-30 this is a pull request for the upcoming v4.3 release. Marek Vasut provides a patch to use the correct attrlen in the nla_put() in the can_fill_info() function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * can: Use correct type in sizeof() in nla_put()Marek Vasut2015-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | The sizeof() is invoked on an incorrect variable, likely due to some copy-paste error, and this might result in memory corruption. Fix this. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: netdev@vger.kernel.org Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
* | stmmac: Correctly report PTP capabilities.Phil Reid2015-11-011-2/+5
| | | | | | | | | | | | | | | | | | | | priv->hwts_*_en indicate if timestamping is enabled/disabled at run time. But priv->dma_cap.time_stamp and priv->dma_cap.atime_stamp indicates HW is support for PTPv1/PTPv2. Signed-off-by: Phil Reid <preid@electromag.com.au> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'ipv4_link_down'David S. Miller2015-11-013-10/+23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Julian Anastasov says: ==================== ipv4: fix problems from the RTNH_F_LINKDOWN introduction Fix two problems from the change that introduced RTNH_F_LINKDOWN flag. The first patch deals with the removal of local route on DOWN event. The second patch makes sure the RTNH_F_LINKDOWN flag is properly updated on UP event because the DOWN event sets it in all cases. v2->v3: - use bool for force var v1->v2: - forgot to add ifconfig dummy0 down in the test case - split to 2 patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: update RTNH_F_LINKDOWN flag on UP eventJulian Anastasov2015-11-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nexthop is part of multipath route we should clear the LINKDOWN flag when link goes UP or when first address is added. This is needed because we always set LINKDOWN flag when DEAD flag was set but now on UP the nexthop is not dead anymore. Examples when LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered: - link goes down (LINKDOWN is set), then link goes UP and device shows carrier OK but LINKDOWN remains set - last address is deleted (LINKDOWN is set), then address is added and device shows carrier OK but LINKDOWN remains set Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up here add a multipath route where one nexthop is for dummy0: ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE ifconfig dummy0 down ifconfig dummy0 up now ip route shows nexthop that is not dead. Now set the sysctl var: echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown now ip route will show a dead nexthop because the forgotten RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD. Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: fix to not remove local route on link downJulian Anastasov2015-11-013-10/+16
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event we should not delete the local routes if the local address is still present. The confusion comes from the fact that both fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN constant. Fix it by returning back the variable 'force'. Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up ifconfig dummy0 down ip route list table local | grep dummy | grep host local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1 Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: bcmgenet: Software reset EPHY after power onFlorian Fainelli2015-11-013-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | The EPHY on GENET v1->v3 is extremely finicky, and will show occasional failures based on the timing and reset sequence, ranging from duplicate packets, to extremely high latencies. Perform an additional software reset, and re-configuration to make sure it is in a consistent and working state. Fixes: 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: smsc911x: Fix crash if loopback test failsPavel Fedin2015-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On certain hardware in certain situations loopback test fails and the driver gets removed. During mdiobus_unregister() instance of PHY driver gets disposed. But by this time it has already been started using phy_connect_direct(). PHY driver uses DELAYED_WORK in order to maintain its state. Attempting to dispose the driver without calling phy_disconnect() causes deallocation of DELAYED_WORK being active. This shortly causes a bad crash in timer code. The problem can be discovered by enabling CONFIG_DEBUG_OBJECTS_TIMERS and CONFIG_DEBUG_OBJECTS_FREE Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: linearize arriving NAME_DISTR and LINK_PROTO buffersJon Paul Maloy2015-11-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing of the new UDP bearer has revealed that reception of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE message buffers is not prepared for the case that those may be non-linear. We now linearize all such buffers before they are delivered up to the generic reception layer. In order for the commit to apply cleanly to 'net' and 'stable', we do the change in the function tipc_udp_recv() for now. Later, we will post a commit to 'net-next' moving the linearization to generic code, in tipc_named_rcv() and tipc_link_proto_rcv(). Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | fec: Use gpio_set_value_cansleep()Fabio Estevam2015-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | We are in a context where we can sleep, and the FEC PHY reset gpio may be on an I2C expander. Use the cansleep() variant when setting the GPIO value. Based on a patch from Russell King for pci-mvebu.c. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2015-10-311-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This sets the stable pages flag on the RBD block device when we have CRCs enabled. (This is necessary since the default assumption for block devices changed in 3.9)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: require stable pages if message data CRCs are enabled
| * | rbd: require stable pages if message data CRCs are enabledRonny Hegewald2015-10-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rbd requires stable pages, as it performs a crc of the page data before they are send to the OSDs. But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0 "mm: only enforce stable page writes if the backing device requires it") it is not assumed anymore that block devices require stable pages. This patch sets the necessary flag to get stable pages back for rbd. In a ceph installation that provides multiple ext4 formatted rbd devices "bad crc" messages appeared regularly (ca 1 message every 1-2 minutes on every OSD that provided the data for the rbd) in the OSD-logs before this patch. After this patch this messages are pretty much gone (only ca 1-2 / month / OSD). Cc: stable@vger.kernel.org # 3.9+, needs backporting Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de> [idryomov@gmail.com: require stable pages only in crc case, changelog] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | Merge branch 'overlayfs-linus' of ↵Linus Torvalds2015-10-313-3/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs bug fixes from Miklos Szeredi: "This contains fixes for bugs that appeared in earlier kernels (all are marked for -stable)" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: free lower_mnt array in ovl_put_super ovl: free stack of paths in ovl_fill_super ovl: fix open in stacked overlay ovl: fix dentry reference leak ovl: use O_LARGEFILE in ovl_copy_up()
| * | | ovl: free lower_mnt array in ovl_put_superKonstantin Khlebnikov2015-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes memory leak after umount. Kmemleak report: unreferenced object 0xffff8800ba791010 (size 8): comm "mount", pid 2394, jiffies 4294996294 (age 53.920s) hex dump (first 8 bytes): 20 1c 13 02 00 88 ff ff ....... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa0152bfc>] ovl_fill_super+0x55c/0x9b0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa0152118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: dd662667e6d3 ("ovl: add mutli-layer infrastructure") Cc: <stable@vger.kernel.org> # v4.0+
| * | | ovl: free stack of paths in ovl_fill_superKonstantin Khlebnikov2015-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes small memory leak after mount. Kmemleak report: unreferenced object 0xffff88003683fe00 (size 16): comm "mount", pid 2029, jiffies 4294909563 (age 33.380s) hex dump (first 16 bytes): 20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6.... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa01b7a29>] ovl_fill_super+0x389/0x9a0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa01b7118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: a78d9f0d5d5c ("ovl: support multiple lower layers") Cc: <stable@vger.kernel.org> # v4.0+
| * | | ovl: fix open in stacked overlayMiklos Szeredi2015-10-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If two overlayfs filesystems are stacked on top of each other, then we need recursion in ovl_d_select_inode(). I guess d_backing_inode() is supposed to do that. But currently it doesn't and that functionality is open coded in vfs_open(). This is now copied into ovl_d_select_inode() to fix this regression. Reported-by: Alban Crequy <alban.crequy@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+
| * | | ovl: fix dentry reference leakDavid Howells2015-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ovl_copy_up_locked(), newdentry is leaked if the function exits through out_cleanup as this just to out after calling ovl_cleanup() - which doesn't actually release the ref on newdentry. The out_cleanup segment should instead exit through out2 as certainly newdentry leaks - and possibly upper does also, though this isn't caught given the catch of newdentry. Without this fix, something like the following is seen: BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs] BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs] when unmounting the upper layer after an error occurred in copyup. An error can be induced by creating a big file in a lower layer with something like: dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000)) to create a large file (4.1G). Overlay an upper layer that is too small (on tmpfs might do) and then induce a copy up by opening it writably. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+
| * | | ovl: use O_LARGEFILE in ovl_copy_up()David Howells2015-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Open the lower file with O_LARGEFILE in ovl_copy_up(). Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for catching 32-bit userspace dealing with a file large enough that it'll be mishandled if the application isn't aware that there might be an integer overflow. Inside the kernel, there shouldn't be any problems. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-10-3189-477/+894
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix two regressions in ipv6 route lookups, particularly wrt output interface specifications in the lookup key. From David Ahern. 2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from Herbert Xu. 3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon Arlott. 4) Some smsc phys misbehave with energy detect mode enabled, so add a DT property and disable it on such switches. From Heiko Schocher. 5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer. 6) Fix regression added by removal of openvswitch vport stats, from James Morse. 7) Vendor Kconfig options should be bool, not tristate, from Andreas Schwab. 8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we barf during TCP REPAIR operations. 9) Fix various bugs in openvswitch conntrack support, from Joe Stringer. 10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann. 11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen Hansen. 12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten Keil. 13) Add some device IDs to qmi_wwan, from Bjorn Mork. 14) Fix ovs egress tunnel information when using lwtunnel devices, from Pravin B Shelar. 15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason Wang. 16) Fix incorrect handling of throw routes when the result of the throw cannot find a match, from Xin Long. 17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic Sowa. 18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan Sullivan. 19) Add missing memory barries in descriptor accesses or xgbe driver, from Thomas Lendacky. 20) Fix release conditon test in pppoe_release(), from Guillaume Nault. 21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil. 22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei Shtylyov. 23) Fixing missing of_node_put() calls in various places around the networking, from Julia Lawall. 24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander Duyck. 25) RDS doesn't check pskb_pull()/pskb_trim() return values, from Sowmini Varadhan. 26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues Revert "Merge branch 'ipv6-overflow-arith'" net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present vhost: fix performance on LE hosts bpf: sample: define aarch64 specific registers amd-xgbe: Fix race between access of desc and desc index RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv forcedeth: fix unilateral interrupt disabling in netpoll path openvswitch: Fix skb leak using IPv6 defrag ipv6: Export nf_ct_frag6_consume_orig() openvswitch: Fix double-free on ip_defrag() errors fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key net: mv643xx_eth: add missing of_node_put ath6kl: add missing of_node_put net: phy: mdio: add missing of_node_put netdev/phy: add missing of_node_put net: netcp: add missing of_node_put net: thunderx: add missing of_node_put ipv6: gre: support SIT encapsulation ...
| * | | ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa2015-10-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa2015-10-293-27/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'mlx4-fixes'David S. Miller2015-10-273-2/+4
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Or Gerlitz says: ==================== Mellanox mlx4 driver fixes for 4.3-rc7 Jack's fix is for a regression introduced in 4.3-rc1 Carol's fix addresses an issue which exists for while and turns to beat us hard on PPC, please queue for -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net/mlx4: Copy/set only sizeof struct mlx4_eqe bytesCarol L Soto2015-10-272-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing memcpy/memset of EQEs, we should use sizeof struct mlx4_eqe as the base size and not caps.eqe_size which could be bigger. If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt data in the master context. When using a 64 byte stride, the memcpy copied over 63 bytes to the slave_eq structure. This resulted in copying over the entire eqe of interest, including its ownership bit -- and also 31 bytes of garbage into the next WQE in the slave EQ -- which did NOT include the ownership bit (and therefore had no impact). However, once the stride is increased to 128, we are overwriting the ownership bits of *three* eqes in the slave_eq struct. This results in an incorrect ownership bit for those eqes, which causes the eq to seem to be full. The issue therefore surfaced only once 128-byte EQEs started being used in SRIOV and (overarchitectures that have 128/256 byte cache-lines such as PPC) - e.g after commit 77507aa249ae "net/mlx4_core: Enable CQE/EQE stride support". Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is ↵Jack Morgenstein2015-10-271-0/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | present We do not set the ins_vlan field to zero when no vlan id is present in the packet. Since WQEs in the TX ring are not zeroed out between uses, this oversight could result in having vlan flags present in the WQE ctrl segment when no vlan is preset. Fixes: e38af4faf01d ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan') Reported-by: Gideon Naim <gideonn@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | vhost: fix performance on LE hostsMichael S. Tsirkin2015-10-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian support for legacy devices") introduced a minor regression: even with cross-endian disabled, and even on LE host, vhost_is_little_endian is checking is_le flag so there's always a branch. To fix, simply check virtio_legacy_is_little_endian first. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bpf: sample: define aarch64 specific registersYang Shi2015-10-271-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define aarch64 specific registers for building bpf samples correctly. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | amd-xgbe: Fix race between access of desc and desc indexLendacky, Thomas2015-10-272-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During Tx cleanup it's still possible for the descriptor data to be read ahead of the descriptor index. A memory barrier is required between the read of the descriptor index and the start of the Tx cleanup loop. This allows a change to a lighter-weight barrier in the Tx transmit routine just before updating the current descriptor index. Since the memory barrier does result in extra overhead on arm64, keep the previous change to not chase the current descriptor value. This prevents the execution of the barrier for each loop performed. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in ↵Sowmini Varadhan2015-10-271-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rds_tcp_data_recv Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | forcedeth: fix unilateral interrupt disabling in netpoll pathNeil Horman2015-10-271-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Forcedeth currently uses disable_irq_lockdep and enable_irq_lockdep, which in some configurations simply calls local_irq_disable. This causes errant warnings in the netpoll path as in netpoll_send_skb_on_dev, where we disable irqs using local_irq_save, leading to the following warning: WARNING: at net/core/netpoll.c:352 netpoll_send_skb_on_dev+0x243/0x250() (Not tainted) Hardware name: netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (nv_start_xmit_optimized+0x0/0x860 [forcedeth]) Modules linked in: netconsole(+) configfs ipv6 iptable_filter ip_tables ppdev parport_pc parport sg microcode serio_raw edac_core edac_mce_amd k8temp snd_hda_codec_realtek snd_hda_codec_generic forcedeth snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_nforce2 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1940, comm: modprobe Not tainted 2.6.32-573.7.1.el6.x86_64.debug #1 Call Trace: [<ffffffff8107bbc1>] ? warn_slowpath_common+0x91/0xe0 [<ffffffff8107bcc6>] ? warn_slowpath_fmt+0x46/0x60 [<ffffffffa00fe5b0>] ? nv_start_xmit_optimized+0x0/0x860 [forcedeth] [<ffffffff814b3593>] ? netpoll_send_skb_on_dev+0x243/0x250 [<ffffffff814b37c9>] ? netpoll_send_udp+0x229/0x270 [<ffffffffa02e3299>] ? write_msg+0x39/0x110 [netconsole] [<ffffffffa02e331b>] ? write_msg+0xbb/0x110 [netconsole] [<ffffffff8107bd55>] ? __call_console_drivers+0x75/0x90 [<ffffffff8107bdba>] ? _call_console_drivers+0x4a/0x80 [<ffffffff8107c445>] ? release_console_sem+0xe5/0x250 [<ffffffff8107d200>] ? register_console+0x190/0x3e0 [<ffffffffa02e71a6>] ? init_netconsole+0x1a6/0x216 [netconsole] [<ffffffffa02e7000>] ? init_netconsole+0x0/0x216 [netconsole] [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280 [<ffffffff810d4933>] ? sys_init_module+0xe3/0x260 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b ---[ end trace f349c7af88e6a6d5 ]--- console [netcon0] enabled netconsole: network logging started Fix it by modifying the forcedeth code to use disable_irq_nosync_lockdep_irqsavedisable_irq_nosync_lockdep_irqsave instead, which saves and restores irq state properly. This also saves us a little code in the process Tested by the reporter, with successful restuls Patch applies to the head of the net tree Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Reported-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | openvswitch: Fix skb leak using IPv6 defragJoe Stringer2015-10-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the reassembly is successful, expects the caller to free all of the original skbs using nf_ct_frag6_consume_orig(). This call was previously missing, meaning that the original fragments were never freed (with the exception of the last fragment to arrive). Fix this by ensuring that all original fragments except for the last fragment are freed via nf_ct_frag6_consume_orig(). The last fragment will be morphed into the head, so it must not be freed yet. Furthermore, retain the ->next pointer for the head after skb_morph(). Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: Export nf_ct_frag6_consume_orig()Joe Stringer2015-10-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed in openvswitch to fix an skb leak in the next patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | openvswitch: Fix double-free on ip_defrag() errorsJoe Stringer2015-10-273-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ip_defrag() returns an error other than -EINPROGRESS, then the skb is freed. When handle_fragments() passes this back up to do_execute_actions(), it will be freed again. Prevent this double free by never freeing the skb in do_execute_actions() for errors returned by ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | fib_trie: leaf_walk_rcu should not compute key if key is less than pn->keyAlexander Duyck2015-10-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were computing the child index in cases where the key value we were looking for was actually less than the base key of the tnode. As a result we were getting incorrect index values that would cause us to skip over some children. To fix this I have added a test that will force us to use child index 0 if the key we are looking for is less than the key of the current tnode. Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Brian Rak <brak@gameservers.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'net_of_node_put'David S. Miller2015-10-266-4/+16
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Julia Lawall says: ==================== add missing of_node_put The various for_each device_node iterators performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. The complete semantic patch that fixes this problem is (http://coccinelle.lip6.fr): // <smpl> @r@ local idexpression n; expression e1,e2; iterator name for_each_node_by_name, for_each_node_by_type, for_each_compatible_node, for_each_matching_node, for_each_matching_node_and_match, for_each_child_of_node, for_each_available_child_of_node, for_each_node_with_property; iterator i; statement S; expression list [n1] es; @@ ( ( for_each_node_by_name(n,e1) S | for_each_node_by_type(n,e1) S | for_each_compatible_node(n,e1,e2) S | for_each_matching_node(n,e1) S | for_each_matching_node_and_match(n,e1,e2) S | for_each_child_of_node(e1,n) S | for_each_available_child_of_node(e1,n) S | for_each_node_with_property(n,e1) S ) & i(es,n,...) S ) @@ local idexpression r.n; iterator r.i; expression e; expression list [r.n1] es; @@ i(es,n,...) { ... ( of_node_put(n); | e = n | return n; | + of_node_put(n); ? return ...; ) ... } @@ local idexpression r.n; iterator r.i; expression e; expression list [r.n1] es; @@ i(es,n,...) { ... ( of_node_put(n); | e = n | + of_node_put(n); ? break; ) ... } ... when != n @@ local idexpression r.n; iterator r.i; expression e; identifier l; expression list [r.n1] es; @@ i(es,n,...) { ... ( of_node_put(n); | e = n | + of_node_put(n); ? goto l; ) ... } ... l: ... when != n// </smpl> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: mv643xx_eth: add missing of_node_putJulia Lawall2015-10-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // <smpl> @@ expression root,e; local idexpression child; @@ for_each_available_child_of_node(root, child) { ... when != of_node_put(child) when != e = child ( return child; | + of_node_put(child); ? return ...; ) ... } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | ath6kl: add missing of_node_putJulia Lawall2015-10-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_compatible_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // <smpl> @@ expression e; local idexpression n; @@ for_each_compatible_node(n,...) { ... when != of_node_put(n) when != e = n ( return n; | + of_node_put(n); ? return ...; ) ... } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | net: phy: mdio: add missing of_node_putJulia Lawall2015-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. A simplified version of the semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // <smpl> @@ expression root,e; local idexpression child; @@ for_each_available_child_of_node(root, child) { ... when != of_node_put(child) when != e = child ( return child; | + of_node_put(child); ? return ...; ) ... } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud