op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'eth_get_headlen'	David S. Miller	2014-09-05	8	-239/+68
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alexander Duyck says: ==================== net: Drop get_headlen functions in favor of generic function This series replaces the igb_get_headlen and ixgbe_get_headlen functions with a generic function named eth_get_headlen. I have done some performance testing on ixgbe with 258 byte frames since the calls are only used on frames larger than 256 bytes and have seen no significant difference in CPU utilization. v2: renamed __skb_get_poff to skb_get_poff renamed ___skb_get_poff to __skb_get_poff ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ixgbe: use new eth_get_headlen interface	Alexander Duyck	2014-09-05	1	-115/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update ixgbe to drop the ixgbe_get_headlen function in favor of eth_get_headlen. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	igb: use new eth_get_headlen interface	Alexander Duyck	2014-09-05	1	-108/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update igb to drop the igb_get_headlen function in favor of eth_get_headlen. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: Add function for parsing the header length out of linear ethernet frames	Alexander Duyck	2014-09-05	6	-16/+66
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates some of the flow_dissector api so that it can be used to parse the length of ethernet buffers stored in fragments. Most of the changes needed were to __skb_get_poff as it needed to be updated to support sending a linear buffer instead of a skb. I have split __skb_get_poff into two functions, the first is skb_get_poff and it retains the functionality of the original __skb_get_poff. The other function is __skb_get_poff which now works much like __skb_flow_dissect in relation to skb_flow_dissect in that it provides the same functionality but works with just a data buffer and hlen instead of needing an skb. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'timestamping'	David S. Miller	2014-09-05	7	-67/+80
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alexander Duyck says: ==================== This change makes it so that the core path for the phy timestamping logic is shared between skb_tx_tstamp and skb_complete_tx_timestamp. In addition it provides a means of using the same skb clone type path in non phy timestamping drivers. The main motivation for this is to enable non-phy drivers to be able to manipulate tx timestamp skbs for such things as putting them in lists or setting aside buffer in the context block. v2: Incorporated suggested changes from Willem de Bruijn and Eric Dumazet dropped uneeded comment restored order of hwtstamp vs swtstamp added destructor for skb Dropped usage of skb_complete_tx_timestamp as a kfree_skb w/ destructor v3: Updated destructor handling and dealt with socket reference counting issues v4: Split out combining destructors into a separate patch ====================
\| *	net: merge cases where sock_efree and sock_edemux are the same function	Alexander Duyck	2014-09-05	3	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since sock_efree and sock_demux are essentially the same code for non-TCP sockets and the case where CONFIG_INET is not defined we can combine the code or replace the call to sock_edemux in several spots. As a result we can avoid a bit of unnecessary code or code duplication. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net-timestamp: Make the clone operation stand-alone from phy timestamping	Alexander Duyck	2014-09-05	6	-21/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The phy timestamping takes a different path than the regular timestamping does in that it will create a clone first so that the packets needing to be timestamped can be placed in a queue, or the context block could be used. In order to support these use cases I am pulling the core of the code out so it can be used in other drivers beyond just phy devices. In addition I have added a destructor named sock_efree which is meant to provide a simple way for dropping the reference to skb exceptions that aren't part of either the receive or send windows for the socket, and I have removed some duplication in spots where this destructor could be used in place of sock_edemux. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net-timestamp: Merge shared code between phy and regular timestamping	Alexander Duyck	2014-09-05	2	-52/+42
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	This change merges the shared bits that exist between skb_tx_tstamp and skb_complete_tx_timestamp. By doing this we can avoid the two diverging as there were already changes pushed into skb_tx_tstamp that hadn't made it into the other function. In addition this resolves issues with the fact that skb_complete_tx_timestamp was included in linux/skbuff.h even though it was only compiled in if phy timestamping was enabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: harden fnhe_hashfun()	Eric Dumazet	2014-09-05	2	-5/+6
\| \| \| \| \| \| \| \|	Lets make this hash function a bit secure, as ICMP attacks are still in the wild. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net-timestamp: fix allocation error in test	Willem de Bruijn	2014-09-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	A buffer is incorrectly zeroed to the length of the pointer. If cfg_payload_len < sizeof(void *) this can overwrites unrelated memory. The buffer contents are never read, so no need to zero. Fixes: 8fe2f761cae9 ("net-timestamp: expand documentation") Reported-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	hyperv: NULL dereference on error	Dan Carpenter	2014-09-05	1	-4/+2
\| \| \| \| \| \| \| \| \|	We try to call free_netvsc_device(net_device) when "net_device" is NULL. It leads to an Oops. Fixes: f90251c8a6d0 ('hyperv: Increase the buffer length for netvsc_channel_cb()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2014-09-05	12	-127/+156
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2014-09-04 This series contains updates to i40e, i40evf, ixgbe and ixgbevf. Catherine adds dual speed module support to i40e. Updates i40e to allow the user to change link settings when the link is down. Serey renames i40e_ndo_set_vf_spoofck() to i40e_ndo_set_vf_spookchk() to be more consistent with what is defined in netdev and removes a unnecessary variable assignment. Jesse makes a malicious driver detection warning only print if extended driver string is enabled for i40e. Fixes a panic under traffic load when resetting or if/whenever there was a Tx-timeout because we were enabling the Tx queue to early. Anjali fixes an issue when PF reset fails, where we were trying to restart the admin queue which has not been setup at that point. This resolves an occasional kernel panic when PF reset fails for some reason. Ethan Zhao replaces the use of a local i40e_vfs_are_assigned() with the global kernel pci_vfs_assigned() for i40e. Alex cleans up the FDB handling for ixgbe. This change makes it so that the behavior for FDB handling is consistent between both the SR-IOV and non-SR-IOV cases. The main change is that we perform bounds checking on the number of SR-IOV addresses regardless of if SR-IOV is enabled or not as we can only support a certain number of addresses in the hardware. Emil extends the pending Tx work check to the VF interfaces, where the driver initiates a reset of the interface on link loss with pending Tx work in order to clear the rings. Introduces a delay for 82599 VFs of at least 500 usecs to make sure the VFLINKS value is correct, since this bit tends to flap when a DA or SFP+ cable is disconnected. Jacob adds code comments in ixgbe to make it more obvious that we are resetting features based on the fact that we do not have MSI-X enabled, and cannot use the previous settings. Also resolves a kernel NULL pointer dereference by limiting the combined total of MACVLAN and SR-IOV VFs, since the hardware has a limited number of pools available (64). Previously, no checks were in place to limit the number of accelerated MACVLAN devices based on the number of pools, which would be ok since there was already a limit for these well below the number of available pools. However, SR-IOV uses the very same pools, therefore we need to ensure that the total number of pools does not exceed the number of pools available in the hardware. v2: - clean up code comment in patch 5 by replacing "an" with "auto negotiation" based on feedback from Sergei Shtylyov - removed un-necessary parenthesis around function call in patch 8 based on feedback from Sergei Shtylyov ====================
\| *	ixgbe: limit combined total of macvlan and SR-IOV VFs	Jacob Keller	2014-09-04	2	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hardware has a limited number of pools available (64). Previously, no checks were in place to limit the number of accelerated macvlan devices based on the number of pools. Normally this would be ok, because there was already a limit for these well below the number of available pools. However, SR-IOV uses the very same pools. Therefor, we need to ensure that the total number of pools (number of VFs plus the number of non-VF pools in use for accelerated macvlans) does not exceed the number of pools available in hardware. This patch resolves a kernel NULL pointer dereference caused by the following commands: $modprobe ixgbe max_vfs=63 $ethtool -K eth2 l2-fwd-offload on $ip link add link eth2 macvlan0 type macvlan $ip link set dev macvlan0 up [ 992.950080] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056 [ 992.951109] IP: [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe] [ 992.951684] PGD 22a80e067 PUD 232e9b067 PMD 0 [ 992.952389] Oops: 0000 [#1] SMP [ 992.953014] Modules linked in: nfsd lockd nfs_acl exportfs auth_rpcgss oid_registry sunrpc bridge stp llc vhost_net macvtap macvlan vhost tun kvm_intel kvm ioatdma ixgbe mdio igb dca [ 992.956042] CPU: 2 PID: 11928 Comm: ifconfig Not tainted 3.16.0-rc6-net-next-07-29-2014-FCoE+ #1 [ 992.956915] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [ 992.957791] task: ffff8804341c0000 ti: ffff8801d7dc8000 task.ti: ffff8801d7dc8000 [ 992.958660] RIP: 0010:[<ffffffffa003b71e>] [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe] [ 992.959613] RSP: 0018:ffff8801d7dcbbb8 EFLAGS: 00010286 [ 992.960093] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 [ 992.960575] RDX: ffff880232eb7000 RSI: 0000000000000000 RDI: ffff88022dc05800 [ 992.961059] RBP: ffff8801d7dcbbd8 R08: 0000000000000000 R09: 0000000000000000 [ 992.961541] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88022ec20980 [ 992.962023] R13: ffff880232eb7000 R14: 0000000000000001 R15: 0000000000000001 [ 992.962508] FS: 00007fab264887a0(0000) GS:ffff880237640000(0000) knlGS:0000000000000000 [ 992.963378] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 992.963858] CR2: 0000000000000056 CR3: 000000022a939000 CR4: 00000000001427e0 [ 992.964340] Stack: [ 992.964806] ffff88022ec28840 ffff88022ec20980 ffff88022dc05800 ffff880232eb7000 [ 992.965976] ffff8801d7dcbc28 ffffffffa003bae8 ffff8801d7dcbbe8 0000000000000400 [ 992.967147] 000000000000000d ffff88022ec20980 ffff88022ec20000 ffff88022dc05800 [ 992.968319] Call Trace: [ 992.968795] [<ffffffffa003bae8>] ixgbe_fwd_ring_up+0x88/0x280 [ixgbe] [ 992.969284] [<ffffffffa0041d83>] ixgbe_fwd_add+0x173/0x220 [ixgbe] [ 992.969767] [<ffffffffa015056c>] macvlan_open+0x1bc/0x230 [macvlan] [ 992.970256] [<ffffffff816b8de7>] __dev_open+0xd7/0x150 [ 992.970735] [<ffffffff816b8bd7>] __dev_change_flags+0xa7/0x170 [ 992.971220] [<ffffffff816b8ccb>] dev_change_flags+0x2b/0x70 [ 992.971703] [<ffffffff817471b2>] devinet_ioctl+0x602/0x6d0 [ 992.972184] [<ffffffff81748168>] inet_ioctl+0x78/0x90 [ 992.972666] [<ffffffff816a143b>] sock_do_ioctl+0x2b/0x70 [ 992.973146] [<ffffffff816a14ed>] sock_ioctl+0x6d/0x260 [ 992.973627] [<ffffffff811ad3b4>] do_vfs_ioctl+0x84/0x540 [ 992.974109] [<ffffffff811a4c81>] ? final_putname+0x21/0x50 [ 992.974593] [<ffffffff818725d5>] ? sysret_check+0x22/0x5d [ 992.975073] [<ffffffff811ad901>] SyS_ioctl+0x91/0xa0 [ 992.975550] [<ffffffff818725a9>] system_call_fastpath+0x16/0x1b [ 992.976026] Code: ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 4c 8b a7 08 02 00 00 <44> 0f b6 6e 56 44 03 af 14 02 00 00 4c 89 e7 e8 5e f2 ff ff be [ 992.982261] RIP [<ffffffffa003b71e>] ixgbe_disable_fwd_ring+0x1e/0xf0 [ixgbe] [ 992.983212] RSP <ffff8801d7dcbbb8> [ 992.983681] CR2: 0000000000000056 [ 992.984248] ---[ end trace 9f54802b5cc3638b ]--- Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ixgbe: add comment noting recalculation of queues	Jacob Keller	2014-09-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we previously called ixgbe_set_num_queues just prior to attempting to set our interrupt scheme, it may be non obvious why we have to call it again inside the function. Add a comment which helps make it more obvious that we are resetting features based on the fact that we do not have MSI-X enabled, and cannot use the previous settings. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ixgbevf: introduce delay for checking VFLINKS on 82599	Emil Tantilov	2014-09-04	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VFLINKS.LINKUP bit tends to flap when a DA or SFP+ cable is disconnected. It can take up to 500 usecs for the LINKUP bit to be correct. This patch resolves the issue by introducing a delay for 82599 VFs of at least 500 usecs to make sure the VFLINKS value is correct. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ixgbe: reset interface on link loss with pending Tx work from the VF	Emil Tantilov	2014-09-04	2	-12/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ixgbe initiates a reset of the interface on link loss with pending Tx work in order to clear the rings. This patch extends the pending Tx work check to the VF interfaces with the same purpose. Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	ixgbe: Cleanup FDB handling code	Alexander Duyck	2014-09-04	1	-30/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change makes it so that the behavior for FDB handling is consistent between both the SR-IOV and non-SR-IOV cases. The main change here is that we perform bounds checking on the number of SR-IOV addresses regardless of if SR-IOV is enabled or not as we can only support a certain number of addresses in the hardware. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: use global pci_vfs_assigned() to replace local i40e_vfs_are_assigned()	Ethan Zhao	2014-09-04	1	-31/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is global funcion pci_vfs_assigned(), so use it instead of composing local one. Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: Bump i40e/i40evf versions	Catherine Sullivan	2014-09-04	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bump i40e version to 1.0.11 and i40evf version to 1.0.5. Change-ID: I63a60fa2efe82aae87a8a3095f43218db57d46ce Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com>
\| *	i40e: fix panic due to too-early Tx queue enable	Jesse Brandeburg	2014-09-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the panic under traffic load when resetting. This issue could also show up if/whenever there is a Tx-timeout. Change-ID: Ie393a1f17fd5d962e56fc3bfe784899ef25402f5 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Fix an issue when PF reset fails	Anjali Singhai Jain	2014-09-04	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't restart Admin queue subtask if PF reset fails since we do not have the AQ setup at that point. This patch makes sure we disable AQ clean subtask when PF reset fails. This will resolve an occasional kernel panic when PF reset fails for some reason. Change-ID: I11a747773362a8c5c0ad7a10cd34be0bda8eb9e8 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: make warning less verbose	Jesse Brandeburg	2014-09-04	1	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The driver is un-necessarily printing a warning that is only marginally useful to the user. Make the warning only print if extended driver string printing is enabled, other messages related to a reset event will still continue to print. Change-ID: I5e8beca6516a2f176cd2e72b0ac2b3b909e6c953 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Tell OS link is going down when calling set_phy_config	Catherine Sullivan	2014-09-04	1	-3/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we don't seem to be getting an LSE telling us link is going down during set_phy_config (but we do get an LSE telling us we are coming back up), fake one for the OS and tell them link is going down. Also do an atomic restart no matter what because there are times the user may want to end with link up even if they started with link down (like if they accidentally set it to a speed that can't link and are trying to fix it). Change-ID: I0a642af9c1d0feb67bce741aba1a9c33bd349ed6 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Remove unnecessary assignment	Serey Kong	2014-09-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unnecessary setting of "ret" variable as it's already set at the top of the function. Change-ID: Icaccfc67f335817a23579b7c43625d59ad6c9925 Signed-off-by: Serey Kong <serey.kong@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Change wording to be more consistent	Serey Kong	2014-09-04	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change "spoofck" to "spoofchk" to be consistent with as defined in netdev. Change-ID: I9866d6284cb5f92c8d71dc0776c6d1e71dfb62a5 Signed-off-by: Serey Kong <serey.kong@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Allow user to change link settings if link is down	Catherine Sullivan	2014-09-04	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow the user to change auto-negotiation and speed settings if link is down. Change-ID: I372967c627682b5e1835f623a7cbf41b21b51043 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Add dual speed module support	Catherine Sullivan	2014-09-04	2	-20/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that fw has implemented dual speed module support, we can add ours. Also, add the phy type for 1G LR/SR and set its media type to fiber. Lastly, instead of a WARN_ON if the phy type is not recognized just print a warning. Change-ID: I2e5227d4a8c2907b0ed423038e5dbce774e466b0 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* \|	net: ethernet: cpsw: improve interrupt lookup logic in cpsw_probe()	Daniel Mack	2014-09-05	1	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the interrupt resource lookup code in cpsw_probe() by the following: * Only look at the first member of the resource. As the driver only works for DT-enabled platforms anyway, a resource of type IORESOURCE_IRQ will only contain one single entry (res->start == res->end), so there is no need for the iteration. * Add a bounds check to avoid overflows if we are passed more than ARRAY_SIZE(priv->irqs_table) resources. * Assign 'ret' with the return value of devm_request_irq() so that cpsw_probe() returns the appropriate error code. * If devm_request_irq() fails, report the error code in the log message. Signed-off-by: Daniel Mack <zonque@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: fix a race in update_or_create_fnhe()	Eric Dumazet	2014-09-05	3	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nh_exceptions is effectively used under rcu, but lacks proper barriers. Between kzalloc() and setting of nh->nh_exceptions(), we need a proper memory barrier. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	l2tp: fix missing line continuation	Andy Zhou	2014-09-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This syntax error was covered by L2TP_REFCNT_DEBUG not being set by default. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'amd-xgbe-next'	David S. Miller	2014-09-05	12	-95/+102
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2014-09-03 The following series of patches includes fixes/updates to the driver. - Query the device for the actual speed mode (KR/KX) rather than trying to track it - Update parallel detection logic to support KR mode - Fix new warnings from checkpatch in the amd-xgbe and amd-xgbe-phy driver This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	amd-xgbe-phy: Checkpatch driver fixes	Lendacky, Thomas	2014-09-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains fixes identified by checkpatch when run with the strict option. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	amd-xgbe: Checkpatch driver fixes	Lendacky, Thomas	2014-09-05	11	-26/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains fixes identified by checkpatch when run with the strict option. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	amd-xgbe-phy: Enhance parallel detection to support KR speed	Lendacky, Thomas	2014-09-05	1	-2/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support to allow parallel detection to work in KR speed. With both speed modes of KX and KR supported, KX must be checked first. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	amd-xgbe-phy: Check device for current speed mode (KR/KX)	Lendacky, Thomas	2014-09-05	1	-63/+72
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since device resets can change the current mode it's possible to think the device is in a different mode than it actually is. Rather than trying to determine every place that is needed to set/save the current mode, be safe and check the devices actual mode when needed rather than trying to track it. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'r8152-next'	David S. Miller	2014-09-05	1	-28/+31
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hayes Wang says: ==================== r8152: random MAC address If the interface has invalid MAC address, it couldn't be used. In order to let it work normally, give a random one. v3: Remove ether_addr_copy(dev->perm_addr, dev->dev_addr); v2: Use "%pM" format specifier for printing a MAC address. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	r8152: use eth_hw_addr_random	hayeswang	2014-09-05	1	-16/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the hw doesn't have a valid MAC address, give a random one and set it to the hw. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	r8152: change the location of rtl8152_set_mac_address	hayeswang	2014-09-05	1	-17/+17
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exchange the location of rtl8152_set_mac_address() and set_ethernet_addr(). Then, the set_ethernet_addr() could set the MAC address by calling rtl8152_set_mac_address() later. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'rx_copybreak'	David S. Miller	2014-09-05	6	-3/+200
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Govindarajulu Varadarajan says: ==================== enic: Add support for rx_copybreak The following series implements rx_copybreak. dma_map_single()/dma_unmap_single() is more expensive than alloc_skb & memcpy for smaller packets. By doing this we can reuse the dma buff which is already mapped. This is very useful when iommu is on. The default skb copybreak value is 256. When iommu is on, we can go much higher than 256. All the drivers that supports rx_copybreak provides module parameter to change this value. Since module parameter is the least preferred way for changing driver values, this series adds ethtool support for setting rx_copybreak. v4: Validate tunable length in ethtool_get_tunable, not in driver implemented function. Loose tunable_ops array for each tunable type. Define one function and let the driver use switch case for each type. Use double underscore for data type in UAPI headers. Use const qualifier where possible. v3: Add tunable namespace to ethtool. Use new ethtool cmd ETHTOOL_S/GTUNABLE to set/get rx_copybreak from userspace. v2: Add new ethtool_cmd for DMA buffer parameters, instead of adding new members to existing ethtool_ringparam. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	enic: Add tunable_ops support for rx_copybreak	Govindarajulu Varadarajan	2014-09-05	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for setting/getting rx_copybreak using generic ethtool tunable. Defines enic_get_tunable() & enic_set_tunable() to get/set rx_copybreak. As of now, these two function supports only rx_copybreak. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ethtool: Add generic options for tunables	Govindarajulu Varadarajan	2014-09-05	3	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting tunable values from driver. Add get_tunable and set_tunable to ethtool_ops. Driver implements these functions for getting/setting tunable value. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	enic: implement rx_copybreak	Govindarajulu Varadarajan	2014-09-05	2	-3/+48
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling dma_map_single()/dma_unmap_single() is quite expensive compared to copying a small packet. So let's copy short frames and keep the buffers mapped. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	dev_ioctl: remove dev_load() CAP_SYS_MODULE message	Daniel Borkmann	2014-09-05	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Marcel reported to see the following message when autoloading is being triggered when adding nlmon device: Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-nlmon instead. This false-positive happens despite with having correct capabilities set, e.g. through issuing `ip link del dev nlmon` more than once on a valid device with name nlmon, but Marcel has also seen it on creation time when no nlmon module is previously compiled-in or loaded as module and the device name equals a link type name (e.g. nlmon, vxlan, team). Stephen says: The netdev module alias is a hold over from the past. For normal devices, people used to create a alias eth0 to and point it to the type of network device used, that was back in the bad old ISA days before real discovery. Also, the tunnels create module alias for the control device and ip used to use this to autoload the tunnel device. The message is bogus and should just be removed, I also see it in a couple of other cases where tap devices are renamed for other usese. As mentioned in 8909c9ad8ff0 ("net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules"), we nevertheless still might want to leave the old autoloading behaviour in place as it could break old scripts, so for now, lets just remove the log message as Stephen suggests. Reference: http://thread.gmane.org/gmane.linux.kernel/1105168 Reported-by: Marcel Holtmann <marcel@holtmann.org> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: bpf: make eBPF interpreter images read-only	Daniel Borkmann	2014-09-05	11	-32/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: systemport: update UMAC_CMD only when link is detected	Florian Fainelli	2014-09-04	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we bring the interface down, phy_stop() will schedule the PHY state machine to call our link adjustment callback. By the time we do so, we may have clock gated off the SYSTEMPORT hardware block, and this will cause bus errors to happen in bcm_sysport_adj_link(): Make sure that we only touch the UMAC_CMD register when there is an actual link. This is safe to do for two reasons: - updating the Ethernet MAC registers only make sense when a physical link is present - the PHY library state machine first set phydev->link = 0 before invoking phydev->adjust_link in the PHY_HALTED case This is a similar fix to the GENET one: c677ba8b3c47650358572091ed8a6af50bfca877 ("net: bcmgenet: update UMAC_CMD only when link is detected"). Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv4: implement igmp_qrv sysctl to tune igmp robustness variable	Hannes Frederic Sowa	2014-09-04	4	-16/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As in IPv6 people might increase the igmp query robustness variable to make sure unsolicited state change reports aren't lost on the network. Add and document this new knob to igmp code. RFCs allow tuning this parameter back to first IGMP RFC, so we also use this setting for all counters, including source specific multicast. Also take over sysctl value when upping the interface and don't reuse the last one seen on the interface. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv6: add sysctl_mld_qrv to configure query robustness variable	Hannes Frederic Sowa	2014-09-04	4	-10/+31
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4_en: Convert the normal skb free path to dev_consume_skb_any()	Rick Jones	2014-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	It would appear the mlx4_en driver was still making a call to dev_kfree_skb_any() where dev_consume_skb_any() would be more appropriate. This should make dropped packet profiling/tracking easier/better over a NIC driven by mlx4_en. Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	lib/rhashtable: allow user to set the minimum shifts of shrinking	Ying Xue	2014-09-03	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although rhashtable library allows user to specify a quiet big size for user's created hash table, the table may be shrunk to a very small size - HASH_MIN_SIZE(4) after object is removed from the table at the first time. Subsequently, even if the total amount of objects saved in the table is quite lower than user's initial setting in a long time, the hash table size is still dynamically adjusted by rhashtable_shrink() or rhashtable_expand() each time object is inserted or removed from the table. However, as synchronize_rcu() has to be called when table is shrunk or expanded by the two functions, we should permit user to set the minimum table size through configuring the minimum number of shifts according to user specific requirement, avoiding these expensive actions of shrinking or expanding because of calling synchronize_rcu(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	qdisc: validate frames going through the direct_xmit path	Jesper Dangaard Brouer	2014-09-03	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 50cbe9ab5f8d ("net: Validate xmit SKBs right when we pull them out of the qdisc") the validation code was moved out of dev_hard_start_xmit and into dequeue_skb. However this overlooked the fact that we do not always enqueue the skb onto a qdisc. First situation is if qdisc have flag TCQ_F_CAN_BYPASS and qdisc is empty. Second situation is if there is no qdisc on the device, which is a common case for software devices. Originally spotted and inital patch by Alexander Duyck. As a result Alex was seeing issues trying to connect to a vhost_net interface after commit 50cbe9ab5f8d was applied. Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in __dev_queue_xmit() when no qdisc. Also handle the error situation where dev_hard_start_xmit() could return a skb list, and does not return dev_xmit_complete(rc) and falls through to the kfree_skb(), in that situation it should call kfree_skb_list(). Fixes: 50cbe9ab5f8d ("net: Validate xmit SKBs right when we pull them out of the qdisc") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>