op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2013-09-05	143	-684/+1483
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c net/bridge/br_multicast.c net/ipv6/sit.c The conflicts were minor: 1) sit.c changes overlap with change to ip_tunnel_xmit() signature. 2) br_multicast.c had an overlap between computing max_delay using msecs_to_jiffies and turning MLDV2_MRC() into an inline function with a name using lowercase instead of uppercase letters. 3) stmmac had two overlapping changes, one which conditionally allocated and hooked up a dma_cfg based upon the presence of the pbl OF property, and another one handling store-and-forward DMA made. The latter of which should not go into the new of_find_property() basic block. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: Add missing braces to do_tcp_setsockopt	Dave Jones	2013-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Dave Jones <davej@fedoraproject.org> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	caif: Add missing braces to multiline if in cfctrl_linkup_request	Dave Jones	2013-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The indentation here implies this was meant to be a multi-line if. Introduced several years back in commit c85c2951d4da1236e32f1858db418221e624aba5 ("caif: Handle dev_queue_xmit errors.") Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	bnx2x: Add missing braces in bnx2x:bnx2x_link_initialize	Dave Jones	2013-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The indentation here implies that the intent was for this to be a multiline if. Introduced a few years ago in commit ec146a6f019923819f5ca381980248b6d154ca1a ("bnx2x: Modify XGXS functions") Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	vxlan: Fix kernel panic on device delete.	Pravin B Shelar	2013-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On vxlan device create if socket create fails vxlan device is not added to hash table. Therefore we need to check if device is in hashtable before we delete it from hlist. Following patch avoid the crash. net-next already has this fix. ---8<--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa05f9ca7>] vxlan_dellink+0x77/0xf0 [vxlan] PGD 42b2d9067 PUD 42e04c067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: vxlan(-) Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 1.4.8 10/25/2012 task: ffff88042ecf8760 ti: ffff88042f106000 task.ti: ffff88042f106000 RIP: 0010:[<ffffffffa05f9ca7>] [<ffffffffa05f9ca7>] vxlan_dellink+0x77/0xf0 [vxlan] RSP: 0018:ffff88042f107e28 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88082af08000 RCX: ffff88083fd80000 RDX: 0000000000000000 RSI: ffff88042f107e58 RDI: ffff88042e12f810 RBP: ffff88042f107e48 R08: ffffffff8166eca0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88082af087c0 R13: ffff88042e12f000 R14: ffff88042f107e58 R15: ffff88042f107e58 FS: 00007f4ed2de7700(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000042e076000 CR4: 00000000000407e0 Stack: ffff88082af08000 ffffffff81654848 ffffffffa05fb4e0 ffffffff81654780 ffff88042f107e98 ffffffff813b9c7a ffff88042f107e58 ffff88042f107e58 ffff88042f107e88 ffffffffa05fb4e0 ffffffffa05fb780 ffff88042f107f18 Call Trace: [<ffffffff813b9c7a>] __rtnl_link_unregister+0xca/0xd0 [<ffffffff813bb0e9>] rtnl_link_unregister+0x19/0x30 [<ffffffffa05faa4c>] vxlan_cleanup_module+0x10/0x2f [vxlan] [<ffffffff81099fef>] SyS_delete_module+0x1cf/0x2c0 [<ffffffff8146c069>] ? do_page_fault+0x9/0x10 [<ffffffff8146f012>] system_call_fastpath+0x16/0x1b Code: 4d 85 ed 0f 84 95 00 00 00 4c 8d a7 c0 07 00 00 49 8d bd 10 08 00 00 e8 28 e8 e6 e0 48 8b 83 c0 07 00 00 49 8b 54 24 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 49 b8 00 02 20 00 00 00 ad de 4d 89 RIP [<ffffffffa05f9ca7>] vxlan_dellink+0x77/0xf0 [vxlan] RSP <ffff88042f107e28> CR2: 0000000000000000 Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvneta: implement ->ndo_do_ioctl() to support PHY ioctls	Thomas Petazzoni	2013-09-05	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit implements the ->ndo_do_ioctl() operation so that the PHY-related ioctl() calls can work from userspace, which allows applications like mii-tool or mii-diag to do their job. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvneta: properly disable HW PHY polling and ensure adjust_link() works	Thomas Petazzoni	2013-09-05	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes a long-standing bug that has been reported by many users: on some Armada 370 platforms, only the network interface that has been used in U-Boot to tftp the kernel works properly in Linux. The other network interfaces can see a 'link up', but are unable to transmit data. The reports were generally made on the Armada 370-based Mirabox, but have also been given on the Armada 370-RD board. The network MAC in the Armada 370/XP (supported by the mvneta driver in Linux) has a functionality that allows it to continuously poll the PHY and directly update the MAC configuration accordingly (speed, duplex, etc.). The very first versions of the driver submitted for review were using this hardware mechanism, but due to this, the driver was not integrated with the kernel phylib. Following reviews, the driver was changed to use the phylib, and therefore a software based polling. In software based polling, Linux regularly talks to the PHY over the MDIO bus, and sees if the link status has changed. If it's the case then the adjust_link() callback of the driver is called to update the MAC configuration accordingly. However, it turns out that the adjust_link() callback was not configuring the hardware in a completely correct way: while it was setting the speed and duplex bits correctly, it wasn't telling the hardware to actually take into account those bits rather than what the hardware-based PHY polling mechanism has concluded. So, in fact the adjust_link() callback was basically a no-op. However, the network happened to be working because on the network interfaces used by U-Boot for tftp on Armada 370 platforms because the hardware PHY polling was enabled by the bootloader, and left enabled by Linux. However, the second network interface not used for tftp (or both network interfaces if the kernel is loaded from USB, NAND or SD card) didn't had the hardware PHY polling enabled. This patch fixes this situation by: (1) Making sure that the hardware PHY polling is disabled by clearing the MVNETA_PHY_POLLING_ENABLE bit in the MVNETA_UNIT_CONTROL register in the driver ->probe() function. (2) Making sure that the duplex and speed selections made by the adjust_link() callback are taken into account by clearing the MVNETA_GMAC_AN_SPEED_EN and MVNETA_GMAC_AN_DUPLEX_EN bits in the MVNETA_GMAC_AUTONEG_CONFIG register. This patch has been tested on Armada 370 Mirabox, and now both network interfaces are usable after boot. [ Problem introduced by commit c5aff18 ("net: mvneta: driver for Marvell Armada 370/XP network unit") ] Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Jochen De Smet <jochen.armkernel@leahnim.org> Cc: Peter Sanford <psanford@nearbuy.io> Cc: Ethan Tuttle <ethan@ethantuttle.com> Cc: Chény Yves-Gael <yves@cheny.fr> Cc: Ryan Press <ryan@presslab.us> Cc: Simon Guinot <simon.guinot@sequanux.org> Cc: vdonnefort@lacie.com Cc: stable@vger.kernel.org Acked-by: Jason Cooper <jason@lakedaemon.net> Tested-by: Vincent Donnefort <vdonnefort@gmail.com> Tested-by: Yves-Gael Cheny <yves@cheny.fr> Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sh_eth: fix napi_{en\|dis}able() calls racing against interrupts	Sergei Shtylyov	2013-09-04	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While implementing NAPI for the driver, I overlooked the race conditions where interrupt handler might have called napi_schedule_prep() before napi_enable() was called or after napi_disable() was called. If RX interrupt happens, this would cause the endless interrupts and messages like: sh-eth eth0: ignoring interrupt, status 0x00040000, mask 0x01ff009f. The interrupt wouldn't even be masked by the kernel eventually since the handler would return IRQ_HANDLED all the time. As a fix, move napi_enable() call before request_irq() call and napi_disable() call after free_irq() call. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcv	Daniel Borkmann	2013-09-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In tcp_v6_do_rcv() code, when processing pkt options, we soley work on our skb clone opt_skb that we've created earlier before entering tcp_rcv_established() on our way. However, only in condition ... if (np->rxopt.bits.rxtclass) np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); ... we work on skb itself. As we extract every other information out of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can already be released by tcp_rcv_established() earlier on. When we try to access it in ipv6_hdr(), we will dereference freed skb. [ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for IP_PKTOPTIONS") ] Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Don't depend on per socket memory for neighbour discovery messages	Thomas Graf	2013-09-04	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fec: fix the error to get the previous BD entry	Duan Fugang-B38611	2013-09-04	2	-46/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug: error to get the previous BD entry. When the current BD is the first BD, the previous BD entry must be the last BD, not "bdp - 1" in current logic. V4: * Optimize fec_enet_get_nextdesc() for code clean. Replace "ex_new_bd - ring_size" with "ex_base". Replace "new_bd - ring_size" with "base". V3: * Restore the API name because David suggest to use fec_enet_ prefix for all function in fec driver. So, change next_bd() -> fec_enet_get_nextdesc() change pre_bd() -> fec_enet_get_prevdesc() * Reduce the two APIs parameters for easy to call. V2: * Add tx_ring_size and rx_ring_size to struct fec_enet_private. * Replace api fec_enet_get_nextdesc() with next_bd(). Replace api fec_enet_get_prevdesc() with pre_bd(). * Move all ring size check logic to next_bd() and pre_bd(), which simplifies the code redundancy. V1: * Add BD ring size check to get the previous BD entry in correctly. Reviewed-by: Li Frank <B20596@freescale.com> Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Frank Li <frank.li@freescale.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fix null pointer dereference in __ip6addrlbl_add	Hannes Frederic Sowa	2013-09-04	1	-25/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a ("hlist: drop the node parameter from iterators") changed the behavior of hlist_for_each_entry_safe to leave the p argument NULL. Fix this up by tracking the last argument. Reported-by: Michele Baldessari <michele@acksyn.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sh_eth: NAPI requires netif_receive_skb()	Sergei Shtylyov	2013-09-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Driver supporting NAPI should use NAPI-specific function for receiving packets, so netif_rx() should be changed to netif_receive_skb(). Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: ethernet: 8390: Kconfig: add H8300H_AKI3068NET and ↵	Chen Gang	2013-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	H8300H_H8MAX dependancy for NE_H8300 Currently only H8300H_AKI3068NET and H8300H_H8MAX define default I/O base and IRQ values for the NE_H8300 driver. Hence builds for other H8300H platforms will fail as per below. Since H8300H does not support multi platform builds, we simply limit building the driver to those two platforms specifically. The release error: drivers/net/ethernet/8390/ne-h8300.c: In function 'init_dev': drivers/net/ethernet/8390/ne-h8300.c:117:23: error: 'h8300_ne_base' undeclared (first use in this function) drivers/net/ethernet/8390/ne-h8300.c:117:23: note: each undeclared identifier is reported only once for each function it appears in drivers/net/ethernet/8390/ne-h8300.c:117:23: error: bit-field '<anonymous>' width not an integer constant drivers/net/ethernet/8390/ne-h8300.c:119:20: error: 'h8300_ne_irq' undeclared (first use in this function) drivers/net/ethernet/8390/ne-h8300.c: In function 'init_module': drivers/net/ethernet/8390/ne-h8300.c:647:21: error: 'h8300_ne_base' undeclared (first use in this function) drivers/net/ethernet/8390/ne-h8300.c:648:15: error: 'h8300_ne_irq' undeclared (first use in this function) drivers/net/ethernet/8390/ne-h8300.c:661:4: warning: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'long unsigned int' [-Wformat] Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tg3: Don't turn off led on 5719 serdes port 0	Nithin Sujir	2013-09-03	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turning off led on port 0 of the 5719 serdes causes all other ports to lose power and stop functioning. Add tg3_phy_led_bug() function to check for this condition. We use a switch() in tg3_phy_led_bug() for consistency with the tg3_phy_power_bug() function. Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge branch 'calxedaxgmac'	David S. Miller	2013-09-03	1	-79/+116
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A set of calxedaxgmac bug fixes from Rob Herring. Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix xgmac_xmit DMA mapping error handling	Rob Herring	2013-09-03	1	-5/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a DMA mapping error in xgmac_xmit, we should simply free the skb and return NETDEV_TX_OK rather than -EIO. In the case of errors in mapping frags, we need to undo everything that has been setup. Reported-by: Andreas Herrmann <andreas.herrmann@calxeda.com> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix rx DMA mapping API size mismatches	Rob Herring	2013-09-03	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the mismatch in the DMA mapping and unmapping sizes for receive. The unmap size must be equal to the map size and should not be the actual received frame length. The map size should also be adjusted by the NET_IP_ALIGN size since the h/w buffer size (dma_buf_sz) includes this offset. Also, add a missing dma_mapping_error check in xgmac_rx_refill. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: remove some unused statistic counters	Rob Herring	2013-09-03	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rx_sa_filter_fail and tx_undeflow events are unused and impossible to occur based on how the h/w is used. We never filter on source MAC address and TX store and forward mode prevents underflow events. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix various errors in xgmac_set_rx_mode	Rob Herring	2013-09-03	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix xgmac_set_rx_mode to handle several conditions that were not handled correctly as Lennert Buytenhek describes: If we have, say, 100 unicast addresses, and 5 multicast addresses, the unicast address count check will evaluate to true, and set use_hash to true. The multicast address check will however evaluate to false, and use_hash won't be set to true again, and XGMAC_FRAME_FILTER_HMC won't be OR'd into XGMAC_FRAME_FILTER, but since use_hash was still true from the unicast check, netdev_for_each_mc_addr() will program the multicast addresses into the hash table instead of using the MAC address registers, but since the HMC bit wasn't set, the hash table won't be checked for multicast addresses on receive, and we'll stop receiving multicast packets entirely. Also, there is no code that zeroes out MAC address registers reg..31 at the end of this function, meaning that under the right conditions, unicast/multicast addresses that were previously in the filter but were then deleted won't be cleared out. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: enable interrupts after napi_enable	Rob Herring	2013-09-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a race condition where the interrupt handler may have called napi_schedule before napi_enable is called. This would disable interrupts and never actually schedule napi to run. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix race with tx queue stop/wake	Rob Herring	2013-09-03	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the xgmac transmit start and completion work locklessly, it is possible for xgmac_xmit to stop the tx queue after the xgmac_tx_complete has run resulting in the tx queue never being woken up. Fix this by ensuring that ring buffer index updates are visible and recheck the ring space after stopping the queue. Also fix an off-by-one bug where we need to stop the queue when the ring buffer space is equal to MAX_SKB_FRAGS. The implementation used here was copied from drivers/net/ethernet/broadcom/tg3.c. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: update ring buffer tx_head after barriers	Rob Herring	2013-09-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that the descriptor writes are visible before the ring buffer head is updated. Since writel is a barrier, we can simply update the head after the writel. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix possible skb free before tx complete	Rob Herring	2013-09-03	1	-31/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TX completion code may have freed an skb before the entire sg list was transmitted. The DMA unmap calls for the fragments could also get skipped. Now set the skb pointer on every entry in the ring, not just the head of the sg list. We then use the FS (first segment) bit in the descriptors to determine skb head vs. fragment. This also fixes similar bug in xgmac_free_tx_skbufs where clean-up of a sg list that wraps at the end of the ring buffer would not get unmapped. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: fix race between xgmac_tx_complete and xgmac_tx_err	Rob Herring	2013-09-03	1	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for the xgmac_tx_complete to run concurrently with xgmac_tx_err since there are no locks. Fix this by moving the tx error handling to a workqueue so we can disable napi while we reset the transmitter. Reported-by: Andreas Herrmann <andreas.herrmann@calxeda.com> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: read correct field in xgmac_desc_get_buf_len	Rob Herring	2013-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xgmac_desc_get_buf_len appears to have a copy/paste error. flags is the wrong field to read. We should be reading buf_size field. cpu_to_le32 should also be le32_to_cpu. This never really mattered as this function is only used for DMA mapping calls which happen to be nops with coherent DMA. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: calxedaxgmac: remove NETIF_F_FRAGLIST setting	Rob Herring	2013-09-03	1	-1/+1
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The xgmac does not actually handle frag lists, so this option should not be set. It does not appear to have had any impact though. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: ipv6_create_tempaddr cleanup	Petr Holasek	2013-09-03	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This two-liner removes max_addresses variable which is now unecessary related to patch [ipv6: remove max_addresses check from ipv6_create_tempaddr]. Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO	Jiri Bohac	2013-09-03	2	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	2013-08-30	59	-260/+590
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull networking fixes from David Miller: 1) There was a simplification in the ipv6 ndisc packet sending attempted here, which avoided using memory accounting on the per-netns ndisc socket for sending NDISC packets. It did fix some important issues, but it causes regressions so it gets reverted here too. Specifically, the problem with this change is that the IPV6 output path really depends upon there being a valid skb->sk attached. The reason we want to do this change in some form when we figure out how to do it right, is that if a device goes down the ndisc_sk socket send queue will fill up and block NDISC packets that we want to send to other devices too. That's really bad behavior. Hopefully Thomas can come up with a better version of this change. 2) Fix a severe TCP performance regression by reverting a change made to dev_pick_tx() quite some time ago. From Eric Dumazet. 3) TIPC returns wrongly signed error codes, fix from Erik Hugne. 4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the skb->sk too early. Fix from Li Hongjun. 5) RAW ipv4 sockets can use the wrong routing key during lookup, from Chris Clark. 6) Similar to #1 revert an older change that tried to use plain alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner mark which needs to see the skb->sk for such frames. From Phil Oester. 7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz, specifically in the handling of virtual functions. 8) IPSEC path error propagations to sockets is not done properly when we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic Sowa. 9) Fix missing channel context release in mac80211, from Johannes Berg. 10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy Lutomirski. 11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic drivers. From Michal Schmidt. 12) Hopefully a complete and correct fix for the genetlink dump locking and module reference counting. From Pravin B Shelar. 13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir. 14) Fix handling of timestamp offset when restoring a snapshotted TCP socket. From Andrew Vagin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: fec: fix time stamping logic after napi conversion net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay mISDN: return -EINVAL on error in dsp_control_req() net: revert 8728c544a9c ("net: dev_pick_tx() fix") Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" ipv4 tunnels: fix an oops when using ipip/sit with IPsec tipc: set sk_err correctly when connection fails tcp: tcp_make_synack() should use sock_wmalloc bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones ipv6: Don't depend on per socket memory for neighbour discovery messages ipv4: sendto/hdrincl: don't use destination address found in header tcp: don't apply tsoffset if rcv_tsecr is zero tcp: initialize rcv_tstamp for restored sockets net: xilinx: fix memleak net: usb: Add HP hs2434 device to ZLP exception table net: add cpu_relax to busy poll loop net: stmmac: fixed the pbl setting with DT genl: Hold reference on correct module while netlink-dump. genl: Fix genl dumpit() locking. xfrm: Fix potential null pointer dereference in xdst_queue_output ...
\| \| *	net: fec: fix time stamping logic after napi conversion	Richard Cochran	2013-08-30	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dc975382 "net: fec: add napi support to improve proformance" converted the fec driver to the napi model. However, that commit forgot to remove the call to skb_defer_rx_timestamp which is only needed in non-napi drivers. (The function napi_gro_receive eventually calls netif_receive_skb, which in turn calls skb_defer_rx_timestamp.) This patch should also be applied to the 3.9 and 3.10 kernels. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay	Daniel Borkmann	2013-08-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	mISDN: return -EINVAL on error in dsp_control_req()	Dan Carpenter	2013-08-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If skb->len is too short then we should return an error. Otherwise we read beyond the end of skb->data for several bytes. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	net: revert 8728c544a9c ("net: dev_pick_tx() fix")	Eric Dumazet	2013-08-30	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8728c544a9cbdc ("net: dev_pick_tx() fix") and commit b6fe83e9525a ("bonding: refine IFF_XMIT_DST_RELEASE capability") are quite incompatible : Queue selection is disabled because skb dst was dropped before entering bonding device. This causes major performance regression, mainly because TCP packets for a given flow can be sent to multiple queues. This is particularly visible when using the new FQ packet scheduler with MQ + FQ setup on the slaves. We can safely revert the first commit now that 416186fbf8c5b ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") properly caps the queue_index. Reported-by: Xi Wang <xii@google.com> Diagnosed-by: Xi Wang <xii@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	Revert "ipv6: Don't depend on per socket memory for neighbour discovery ↵	David S. Miller	2013-08-30	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	messages" This reverts commit 1f324e38870cc09659cf23bc626f1b8869e201f2. It seems to cause regressions, and in particular the output path really depends upon there being a socket attached to skb->sk for checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2(). Reported-by: Stephen Warren <swarren@wwwdotorg.org> Reported-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	ipv4 tunnels: fix an oops when using ipip/sit with IPsec	Li Hongjun	2013-08-30	2	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 3d7b46cd20e3 (ip_tunnel: push generic protocol handling to ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on an IPv4 over IPv4 tunnel. xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb). Signed-off-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	tipc: set sk_err correctly when connection fails	Erik Hugne	2013-08-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Should a connect fail, if the publication/server is unavailable or due to some other error, a positive value will be returned and errno is never set. If the application code checks for an explicit zero return from connect (success) or a negative return (failure), it will not catch the error and subsequent send() calls will fail as shown from the strace snippet below. socket(0x1e /* PF_??? /, SOCK_SEQPACKET, 0) = 3 connect(3, {sa_family=0x1e / AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111 sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe) The reason for this behaviour is that TIPC wrongly inverts error codes set in sk_err. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	tcp: tcp_make_synack() should use sock_wmalloc	Phil Oester	2013-08-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so, the netfilter owner match lost its ability to block the SYNACK packet on outbound listening sockets. Revert the change, restoring the owner match functionality. This closes netfilter bugzilla #847. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones	Linus Lüssing	2013-08-30	5	-93/+240
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (b00589af3b04) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| *	Merge branch 'master' of ↵	David S. Miller	2013-08-29	15	-32/+86
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== This pull request fixes some issues that arise when 6in4 or 4in6 tunnels are used in combination with IPsec, all from Hannes Frederic Sowa and a null pointer dereference when queueing packets to the policy hold queue. 1) We might access the local error handler of the wrong address family if 6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer to the correct local_error to xfrm_state_afinet. 2) Add a helper function to always refer to the correct interpretation of skb->sk. 3) Call skb_reset_inner_headers to record the position of the inner headers when adding a new one in various ipv6 tunnels. This is needed to identify the addresses where to send back errors in the xfrm layer. 4) Dereference inner ipv6 header if encapsulated to always call the right error handler. 5) Choose protocol family by skb protocol to not call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode. 6) Partly revert "xfrm: introduce helper for safe determination of mtu" because this introduced pmtu discovery problems. 7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs. We need this to get the correct mtu informations in xfrm. 8) Fix null pointer dereference in xdst_queue_output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| \| *	xfrm: Fix potential null pointer dereference in xdst_queue_output	Steffen Klassert	2013-08-28	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The net_device might be not set on the skb when we try refcounting. This leads to a null pointer dereference in xdst_queue_output(). It turned out that the refcount to the net_device is not needed after all. The dst_entry has a refcount to the net_device before we queue the skb, so it can't go away. Therefore we can remove the refcount on queueing to fix the null pointer dereference. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs	Hannes Frederic Sowa	2013-08-26	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we don't initialize skb->protocol when transmitting data via tcp, raw(with and without inclhdr) or udp+ufo or appending data directly to the socket transmit queue (via ip6_append_data). This needs to be done so that we can get the correct mtu in the xfrm layer. Setting of skb->protocol happens only in functions where we also have a transmitting socket and a new skb, so we don't overwrite old values. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	xfrm: revert ipv4 mtu determination to dst_mtu	Hannes Frederic Sowa	2013-08-26	3	-16/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 0ea9d5e3e0e03a63b11392f5613378977dae7eca ("xfrm: introduce helper for safe determination of mtu") I switched the determination of ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is never correct for ipv4 ipsec. This patch partly reverts 0ea9d5e3e0e03a63b11392f5613378977dae7eca ("xfrm: introduce helper for safe determination of mtu"). Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	xfrm: choose protocol family by skb protocol	Hannes Frederic Sowa	2013-08-19	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to choose the protocol family by skb->protocol. Otherwise we call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode, in which case we should call down to xfrm4_local_error (ip6 sockets are a superset of ip4 ones). We are called before before ip_output functions, so skb->protocol is not reset. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	ipv6: xfrm: dereference inner ipv6 header if encapsulated	Hannes Frederic Sowa	2013-08-19	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In xfrm6_local_error use inner_header if the packet was encapsulated. Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	ipv6: wire up skb->encapsulation	Hannes Frederic Sowa	2013-08-19	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pushing a new header before current one call skb_reset_inner_headers to record the position of the inner headers in the various ipv6 tunnel protocols. We later need this to correctly identify the addresses needed to send back an error in the xfrm layer. This change is safe, because skb->protocol is always checked before dereferencing data from the inner protocol. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	xfrm: introduce helper for safe determination of mtu	Hannes Frederic Sowa	2013-08-14	5	-12/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we always have to make sure we a referring to the correct interpretation of skb->sk. We only depend on header defines to query the mtu, so we don't introduce a new dependency to ipv6 by this change. Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| \| *	xfrm: make local error reporting more robust	Hannes Frederic Sowa	2013-08-14	7	-11/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In xfrm4 and xfrm6 we need to take care about sockets of the other address family. This could happen because a 6in4 or 4in6 tunnel could get protected by ipsec. Because we don't want to have a run-time dependency on ipv6 when only using ipv4 xfrm we have to embed a pointer to the correct local_error function in xfrm_state_afinet and look it up when returning an error depending on the socket address family. Thanks to vi0ss for the great bug report: <https://bugzilla.kernel.org/show_bug.cgi?id=58691> v2: a) fix two more unsafe interpretations of skb->sk as ipv6 socket (xfrm6_local_dontfrag and __xfrm6_output) v3: a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when building ipv6 as a module (thanks to Steffen Klassert) Reported-by: <vi0oss@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
\| \| * \|	ipv6: Don't depend on per socket memory for neighbour discovery messages	Thomas Graf	2013-08-29	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, simply use alloc_skb() and don't depend on any socket wmem space any longer. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	ipv4: sendto/hdrincl: don't use destination address found in header	Chris Clark	2013-08-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipv4: raw_sendmsg: don't use header's destination address A sendto() regression was bisected and found to start with commit f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.) The problem is that it tries to ARP-lookup the constructed packet's destination address rather than the explicitly provided address. Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used. cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b Reported-by: Chris Clark <chris.clark@alcatel-lucent.com> Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com> Tested-by: Chris Clark <chris.clark@alcatel-lucent.com> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com> Signed-off-by: David S. Miller <davem@davemloft.net>