op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net: qmi_wwan: ignore bogus CDC Union descriptors	Bjørn Mork	2015-12-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CDC descriptors found on these vendor specific functions should not be considered authoritative. They seem to be ignored by drivers for other systems, and the quality is therefore low. One device (1e0e:9001) has been reported to have such a bogus union descriptor on the QMI function, making it fail probing even if the device id was dynamically added. The report was not complete enough to allow adding a device entry for this modem. But this should at least fix the dynamic id probing problem. Reported-by: Kanerva Topi <Topi.Kanerva@cinia.fi> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/macb: Update device tree binding for resetting PHY using GPIO	Gregory CLEMENT	2015-12-17	2	-5/+18
\| \| \| \| \| \| \| \| \| \| \|	Instead of being at the MAC level the reset gpio preperty is moved at the PHY child node level. It is still managed by the MAC, but from the point of view of the binding it make more sense to be part of the PHY node. This commit also fixes a build errors if GPIOLIB is not selected. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'cxgb4-l2-table-enhancements'	David S. Miller	2015-12-17	3	-62/+137
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hariprasad Shenai says: ==================== Few l2 table related enhancements for cxgb4 This series adds a new API to allocate and update l2t entry, replaces arpq_head/arpq_tail with double skb double linked list. Use t4_mgmt_tx() to send control packets of l2t write request. Use symbolic constants while calculating vlan priority. This patch series has been created against net-next tree and includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V2: Remove unnecessary MAS operation while calculating vlan prio in PATCH 1/4 ("cxgb4: Use symbolic constant for VLAN priority calculation") based on review comment by David Miller ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	cxgb4: Replace arpq_head/arpq_tail with SKB double link-list code	Hariprasad Shenai	2015-12-17	2	-40/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on original work by Michael Werner <werner@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	cxgb4: Use t4_mgmt_tx() API for sending write l2t request ctrl packets.	Hariprasad Shenai	2015-12-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on original work by Michael Werner <werner@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	cxgb4: Add API to alloc l2t entry; also update existing ones	Hariprasad Shenai	2015-12-17	3	-25/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on original work by Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	cxgb4: Use symbolic constant for VLAN priority calculation	Hariprasad Shenai	2015-12-17	1	-1/+1
\|/ \| \| \| \|	Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	nfp: clear ring delayed kick counters	Jakub Kicinski	2015-12-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	We need to clear delayed kick counters when we free rings otherwise after ndo_close()/ndo_open() we could kick HW by more entries than actually written to rings. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Rolf Neugebauer <rolf.neugebauer@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tun: honor IFF_UP in tun_get_user()	Eric Dumazet	2015-12-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a tun interface is turned down, we should not allow packet injection into the kernel. Kernel does not send packets to the tun already. TUNATTACHFILTER can not be used as only tun_net_xmit() is taking care of it. Reported-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: add IPV6_HDRINCL option for raw sockets	Hannes Frederic Sowa	2015-12-17	2	-4/+17
\| \| \| \| \| \| \| \|	Same as in Windows, we miss IPV6_HDRINCL for SOL_IPV6 and SOL_RAW. The SOL_IP/IP_HDRINCL is not available for IPv6 sockets. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv6: allow routes to be configured with expire values	Xin Long	2015-12-17	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Add the support for adding expire value to routes, requested by Tom Gundersen <teg@jklm.no> for systemd-networkd, and NetworkManager wants it too. implement it by adding the new RTNETLINK attribute RTA_EXPIRES. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Pass ndm_state to route netlink FDB notifications.	Hubert Sokolowski	2015-12-16	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change applications monitoring FDB notifications were not able to determine whether a new FDB entry is permament or not: bridge fdb add f1:f2:f3:f4:f5:f8 dev sw0p1 temp self bridge fdb add f1:f2:f3:f4:f5:f9 dev sw0p1 self bridge monitor fdb f1:f2:f3:f4:f5:f8 dev sw0p1 self permanent f1:f2:f3:f4:f5:f9 dev sw0p1 self permanent With this change ndm_state from the original netlink message is passed to the new netlink message sent as notification. bridge fdb add f1:f2:f3:f4:f5:f6 dev sw0p1 self bridge fdb add f1:f2:f3:f4:f5:f7 dev sw0p1 temp self bridge monitor fdb f1:f2:f3:f4:f5:f6 dev sw0p1 self permanent f1:f2:f3:f4:f5:f7 dev sw0p1 self static Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge	David S. Miller	2015-12-16	13	-27/+370
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Antonio Quartulli says: ==================== Included changes: - change my email in MAINTAINERS and Doc files - create and export list of single hop neighs per interface - protect CRC in the BLA code by means of its own lock - minor fixes and code cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	batman-adv: lock crc access in bridge loop avoidance	Simon Wunderlich	2015-12-16	2	-5/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have found some networks in which nodes were constantly requesting other nodes BLA claim tables to synchronize, just to ask for that again once completed. The reason was that the crc checksum of the asked nodes were out of sync due to missing locking and multiple writes to the same crc checksum when adding/removing entries. Therefore the asked nodes constantly reported the wrong crc, which caused repeating requests. To avoid multiple functions changing a backbone gateways crc entry at the same time, lock it using a spinlock. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Tested-by: Alfons Name <AlfonsName@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: Fix typo 'wether' -> 'whether'	Sven Eckelmann	2015-12-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: Use chain pointer when purging fragments	Sven Eckelmann	2015-12-16	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The chain pointer was already created in batadv_frag_purge_orig to make the checks more readable. Just use the chain pointer everywhere instead of having the same dereference + array access in the most lines of this function. Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: unify flags access style in tt global add	Simon Wunderlich	2015-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This should slightly improve readability Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: detect local excess vlans in TT request	Simon Wunderlich	2015-12-16	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the local representation of the global TT table of one originator has more VLAN entries than the respective TT update, there is some inconsistency present. By detecting and reporting this inconsistency, the global table gets updated and the excess VLAN will get removed in the process. Reported-by: Alessandro Bolletta <alessandro@mediaspot.net> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: rename equiv/equal or better to similar or better	Simon Wunderlich	2015-12-16	4	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the function applies a threshold and also slightly worse values are accepted, ''equal or better'' does not represent the intention of the function. ''Similar or better'' represents that better. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: update last seen field of single hop originators	Marek Lindner	2015-12-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: export single hop neighbor list via debugfs	Marek Lindner	2015-12-16	5	-0/+100
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: add bat_hardif_neigh_init algo ops call	Marek Lindner	2015-12-16	2	-0/+6
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	batman-adv: add list of unique single hop neighbors per hard-interface	Marek Lindner	2015-12-16	4	-0/+188
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	Doc: update email address	Antonio Quartulli	2015-12-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	My personal email address has changed. Update it in the doc files Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
\| *	MAINTAINERS: update email address	Antonio Quartulli	2015-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	My personal email address has changed. Update it in the MAINTAINERS file Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* \|	Merge branch 'geneve-udp-port-offload'	David S. Miller	2015-12-16	8	-44/+237
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Anjali Singhai Jain says: ==================== Add support for Geneve udp port offload This patch series adds new ndo ops for Geneve add/del port, so as to help offload Geneve tunnel functionalities such as RX checksum, RSS, filters etc. i40e driver has been tested with the changes to make sure the offloads happen. We do understand that this is not the ideal solution and most likely will be redone with a more generic offload framework. But this certainly will enable us to start seeing benefits of the accelerations for Geneve tunnels. As a side note, we did find an existing issue in i40e driver where a service task can modify tunnel data structures with no locks held to help linearize access. A separate patch will be taking care of that issue. A question out to the community is regarding the driver Kconfig parameters for VxLAN and Geneve, it would be ideal to drop those if there is a way to help resolve vxlan/geneve_get_rx_port symbols while the tunnel modules are not loaded. Performance numbers: With the offloads enable on X722 devices with remote checksum enabled and no other tuning in terms of cpu governer etc on my test machine: With offload Throughput: 5527Mbits/sec with a single thread %cpu: ~43% per core with 4 threads Without offload Throughput: 2364Mbits/sec with a single thread %cpu: ~99% per core with 4 threads These numbers will get better for X722 as it is being worked. But this does bring out the delta in terms of when the stack is notified with csum_level 1 and CHECKSUM_UNNECESSARY vs not without the RX offload. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	i40e: Call geneve_get_rx_port to get the existing Geneve ports	Singhai, Anjali	2015-12-16	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a call to geneve_get_rx_port in i40e so that when it comes up it can learn about the existing geneve tunnels. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	geneve: Add geneve_get_rx_port support	Singhai, Anjali	2015-12-16	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an op that the drivers can call into to get existing geneve ports. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	i40e: Kernel dependency update for i40e to support geneve offload	Singhai, Anjali	2015-12-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the Kconfig file with dependency for supporting GENEVE tunnel offloads. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	i40e: geneve tunnel offload support	Singhai, Anjali	2015-12-16	4	-43/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds driver hooks to implement ndo_ops to add/del udp port in the HW to identify GENEVE tunnels. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	geneve: Add geneve udp port offload for ethernet devices	Singhai, Anjali	2015-12-16	2	-1/+42
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add ndo_ops to add/del UDP ports to a device that supports geneve offload. v2: Comment fix. Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: sctp: dynamically enable or disable pf state	Zhu Yanjun	2015-12-16	5	-2/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As we all know, the value of pf_retrans >= max_retrans_path can disable pf state. The variables of pf_retrans and max_retrans_path can be changed by the userspace application. Sometimes the user expects to disable pf state while the 2 variables are changed to enable pf state. So it is necessary to introduce a new variable to disable pf state. According to the suggestions from Vlad Yasevich, extra1 and extra2 are removed. The initialization of pf_enable is added. Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	sctp: use GFP_KERNEL in sctp_init()	Eric Dumazet	2015-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	modules init functions being called from process context, we better use GFP_KERNEL allocations to increase our chances to get these high-order pages we want for SCTP hash tables. This mostly matters if SCTP module is loaded once memory got fragmented. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'sock-diag-destroy'	David S. Miller	2015-12-15	12	-26/+143
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lorenzo Colitti says: ==================== Support administratively closing application sockets This patchset adds the ability to administratively close a socket without any action from the process owning the socket or the socket protocol. It implements this by adding a new diag_destroy function pointer to struct proto. In-kernel callers can access this functionality directly by calling sk->sk_prot->diag_destroy(sk, err). It also exposes this functionality to userspace via a new SOCK_DESTROY operation in the NETLINK_SOCK_DIAG sockets. This allows a privileged userspace process, such as a connection manager or system administration tool, to close sockets belonging to other apps when the network they were established on has disconnected. It is needed on laptops and mobile hosts to ensure that network switches / disconnects do not result in applications being blocked for long periods of time (minutes) in read or connect calls on TCP sockets that will never succeed because the IP address they are bound to is no longer on the system. Closing the sockets causes these calls to fail fast and allows the apps to reconnect on another network. Userspace intervention is necessary because in many cases the kernel does not have enough information to know that a connection is now inoperable. The kernel can know if a packet can't be routed, but in general it won't know if a TCP connection is stuck because it is now routed to a network where its source address is no longer valid [5][6]. Many other operating systems offer similar functionality: - FreeBSD has had this since 5.4 in 2005 [2]. It is available to privileged userspace and there is a tool to use it [3]. - The FreeBSD commit description states that the idea came from OpenBSD. - iOS has been administratively closing app sockets since iOS 4 - see [4], which states that a socket "might get reclaimed by the kernel" and after that will return EBADF]. For many years Android kernels have supported this via an out-of-tree SIOCKILLADDR ioctl that is called on every RTM_DELADDR event, but this solution is cleaner, more robust and more flexible: the connection manager can iterate over all connections on the deleted IP address and close all of them. It can also be used to close all sockets opened by a given app process, for example if the user has restricted that app from using the network, if a secure network such as a VPN has connected and security policy requires all of an application's connections to be routed via the VPN, etc. - For many years Android kernels have supported an out-of-tree SIOCKILLADDR ioctl that is called when a network disconnects or an RTM_DELADDR event is received. This solution is cleaner, more robust and more flexible. The connection manager can implement SIOCKILLADDR by iterating over all connections on the deleted IP address and close all of them, but it can also close all sockets opened by a given app process (for example if the user has restricted that app from), close all of a user's TCP connections if a user has connected a secure network such as a VPN and expects all of an application's connections to be routed via the VPN, etc. Alternative schemes such as TCP keepalives in combination with "iptables -j REJECT --reject-with tcp-reset", could be used to achieve similar results, but on mobile devices TCP keepalives are very expensive, and in such a scheme detecting stuck connections has to wait for a keepalive to be sent or the application to perform a write. An explicit notification from userspace is cheaper and faster in the common case where an application is blocked on read. SOCK_DESTROY is placed behind an INET_DIAG_DESTROY configuration option, which is currently off by default. The TCP implementation of diag_destroy causes a TCP ABORT as specified by RFC 793 [1]: immediately send a RST and clear local connection state. This is what happens today if an application enables SO_LINGER with a timeout of 0 and then calls close. The first versions of the patchset did not send a RST, but that is not graceful/correct TCP behaviour. tcp_abort now does a proper RFC 793 ABORT and sends a RST to the peer. This is consistent with BSD's tcpdrop, and is more correct in general, even though in many use cases tcp_abort will only be called when sending a RST is no longer possible (e.g., the network has disconnected). The original patchset also behaved like SIOCKILADDR and closed TCP sockets with ETIMEDOUT. Tom Herbert pointed out that it would be better if applications could distinguish between a timeout and an administrative close. ECONNABORTED was chosen because it is consistent with BSD. [1] http://tools.ietf.org/html/rfc793#page-50 [2] http://svnweb.freebsd.org/base?view=revision&revision=141381 [3] https://www.freebsd.org/cgi/man.cgi?query=tcpdrop&sektion=8&manpath=FreeBSD+5.4-RELEASE [4] https://developer.apple.com/library/ios/technotes/tn2277/_index.html#//apple_ref/doc/uid/DTS40010841-CH1-SUBSECTION3 [5] http://www.spinics.net/lists/netdev/msg352775.html [6] http://www.spinics.net/lists/netdev/msg352952.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: diag: Support destroying TCP sockets.	Lorenzo Colitti	2015-12-15	6	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements SOCK_DESTROY for TCP sockets. It causes all blocking calls on the socket to fail fast with ECONNABORTED and causes a protocol close of the socket. It informs the other end of the connection by sending a RST, i.e., initiating a TCP ABORT as per RFC 793. ECONNABORTED was chosen for consistency with FreeBSD. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: diag: Support SOCK_DESTROY for inet sockets.	Lorenzo Colitti	2015-12-15	2	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This passes the SOCK_DESTROY operation to the underlying protocol diag handler, or returns -EOPNOTSUPP if that handler does not define a destroy operation. Most of this patch is just renaming functions. This is not strictly necessary, but it would be fairly counterintuitive to have the code to destroy inet sockets be in a function whose name starts with inet_diag_get. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: diag: Add the ability to destroy a socket.	Lorenzo Colitti	2015-12-15	4	-3/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a SOCK_DESTROY operation, a destroy function pointer to sock_diag_handler, and a diag_destroy function pointer. It does not include any implementation code. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: diag: split inet_diag_dump_one_icsk into two	Lorenzo Colitti	2015-12-15	2	-15/+32
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, inet_diag_dump_one_icsk finds a socket and then dumps its information to userspace. Split it into a part that finds the socket and a part that dumps the information. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'ila-early-demux'	David S. Miller	2015-12-15	13	-81/+988
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Herbert says: ==================== ila: Optimization to preserve value of early demux In the current implementation of ILA, LWT is used to perform translation on both the input and output paths. This is functional, however there is a big performance hit in the receive path. Early demux occurs before the routing lookup (a hit actually obviates the route lookup). Therefore the stack currently performs early demux before translation so that a local connection with ILA addresses is never matched. Note that this issue is not just with ILA, but pretty much any translated or encapsulated packet handled by LWT would miss the opportunity for early demux. Solving the general problem seems non trivial since we would need to move the route lookup before early demx thereby mitigating the value. This patch set addresses the issue for ILA by adding a fast locator lookup that occurs before early demux. This done by hooking in to NF_INET_PRE_ROUTING For the backend we implement an rhashtable that contains identifier to locator to mappings. The table also allows more specific matches that include original locator and interface. This patch set: - Add an rhashtable function to atomically replace and element. This is useful to implement sub-trees from a table entry without needing to use a special anchor structure as the table entry. - Add a start callback for starting a netlink dump. - Creates an ila directory under net/ipv6 and moves ila.c to it. ila.c is split into ila_common.c and ila_lwt.c. - Implement a table to do identifier->locator mapping. This is an rhashtable (in ila_xlat.c). - Configuration for the table with netlink. - Add a hook into NF_INET_PRE_ROUTING to perform ILA translation before early demux. Changes in v2: - Use iptables targets instead of a new xfrm function Changes in v3: - Add __rcu to next pointer in struct ila_map Changes in v4: - Use hook for NF_INET_PRE_ROUTING Changed in v5: - Register hooks per namespace using nf_register_net_hooks - Only register hooks when first mapping is actually added Changed in v6: - Remove gfp argument in alloc_ila_locks, it is unnecessary - Set registered_hooks properly when hooks are registered Testing: Running 200 netperf TCP_RR streams No ILA, baseline 79.26% CPU utilization 1678282 tps 104/189/390 50/90/99% latencies ILA before fix (LWT on both input and output) 81.91% CPU utilization 1464723 tps (-14.5% from baseline) 121/215/411 50/90/99% latencies ILA after fix 80.62% CPU utilization 1622985 (-3.4% from baseline) 110/191/347 50/90/99% latencies ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ila: Add generic ILA translation facility	Tom Herbert	2015-12-15	6	-1/+731
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements an ILA tanslation table. This table can be configured with identifier to locator mappings, and can be be queried to resolve a mapping. Queries can be parameterized based on interface, direction (incoming or outoing), and matching locator. The table is implemented using rhashtable and is configured via netlink (through "ip ila .." in iproute). The table may be used as alternative means to do do ILA tanslations other than the lw tunnels Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netlink: add a start callback for starting a netlink dump	Tom Herbert	2015-12-15	4	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The start callback allows the caller to set up a context for the dump callbacks. Presumably, the context can then be destroyed in the done callback. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	rhashtable: add function to replace an element	Tom Herbert	2015-12-15	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the rhashtable_replace_fast function. This replaces one object in the table with another atomically. The hashes of the new and old objects must be equal. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ila: Create net/ipv6/ila directory	Tom Herbert	2015-12-15	5	-81/+152
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create ila directory in preparation for supporting other hooks in the kernel than LWT for doing ILA. This includes: - Moving ila.c to ila/ila_lwt.c - Splitting out some common functions into ila_common.c Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'stmmac-mdio-compat'	David S. Miller	2015-12-15	8	-28/+66
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge branch 'stmmac-mdio-compat' Phil Reid says: ==================== stmmac: create of compatible mdio bus for stmacc driver Provide ability to specify a fixed phy in the device tree and retain the mdio bus if no phy is found. This is needed where a dsa is connected via a fixed phy and uses the mdio bus for config. Fixed ptp ref clock calculatins for the stmmac when ptp ref clock is running at <= 50Mhz. Also add device tree setting to config ptp clk source on socfpga platforms. Changes from V5: - Restore behaviour of unregister mdio bus when no phys found if there is no device tree node create the bus. - Modify condition to allocate mdio_base_data conditional on fixed phy presece as well. Maintains existing behaviour in conditions where a fixed phy is not present. Changes from V4: - Restore #ifdef CONFIG_OF around setting of reset_gpio. Member doesn't exist when this isn't defined. Changes from V3: - Use if (IS_ENABLED(CONFIG_OF)) instead of #if. Reorder some code to reduce if statements. - of_mdiobus_register already falls back to mdiobus_register - Tested on system with CONFIG_OF Changes from V2: - Formatting, spaces & lines > 80 chars. Using checkpatch - Drop PTP register debugfs patch. Changes from V1: - Fixed mismatch doc / code for ptp_ref_clk dt node. - Remove unit address from doc example. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	stmmac: socfpga: Provide dt node to config ptp clk source.	Phil Reid	2015-12-15	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provides an options to use the ptp clock routed from the Altera FPGA fabric. Instead of the defalt eosc1 clock connected to the ARM HPS core. This setting affects all emacs in the core as the ptp clock is common. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Phil Reid <preid@electromag.com.au> Acked-by: Dinh Nguyen <dinguyen@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	stmmac: Fix calculations for ptp counters when clock input = 50Mhz.	Phil Reid	2015-12-15	3	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stmmac_config_sub_second_increment set the sub second increment to 20ns. Driver is configured to use the fine adjustment method where the sub second register is incremented when the acculumator incremented by the addend register wraps overflows. This accumulator is update on every ptp clk cycle. If a ptp clk with a period of greater than 20ns was used the sub second register would not get updated correctly. Instead set the sub sec increment to twice the period of the ptp clk. This result in the addend register being set mid range and overflow the accumlator every 2 clock cycles. Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	stmmac: Correct documentation on stmmac clocks.	Phil Reid	2015-12-15	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	devm_get_clk looks in clock-name property for matching clock. the ptp_ref_clk property is ignored. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	stmmac: create of compatible mdio bus for stmmac driver	Phil Reid	2015-12-15	3	-4/+32
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DSA driver needs to be passed a reference to an mdio bus. Typically the mac is configured to use a fixed link but the mdio bus still needs to be registered so that it con configure the switch. This patch follows the same process as the altera tse ethernet driver for creation of the mdio bus. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'end-of-ip-csum'	David S. Miller	2015-12-15	41	-114/+437
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Herbert says: ==================== net: The beginning of the end for NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM Background: This patch set starts to address one front in the battle against protocol ossification. Protocol ossification describes the state that we have arrived at in the evolution of the Internet where we are materially limited to only using a very narrow range of protocols and protocol features. For instance, only TCP and UDP is sufficiently supported on the Internet so that deploying alternative protocols, such as SCTP and DCCP, are non-starters. Similarly, IP options and IPv6 extension headers are typically not considered feasible for wide deployment, so we have loss the extensibility of IP protocols. Protocol ossification is not only a problem on the Internet, but in the data center as well. A root cause of this seems to be narrow, protocol specific optimizations implemented in switches (for doing EMCP) and in NICs (NIC offloads). These tend to be performance optimization around TCP and UDP packets, and these have become requirements to implement performant network solutions at scale. Attempts to deal with protocol ossification in data center have yielded ad hoc, sub-optimal solutions. A main driver of foo-over-UDP (e.g. GRE/UDP, MPLS/UDP) is to leverage the existing EMCP and RSS support for UDP by setting the source port as an entropy value. This has seen some success, but the cost of additional overhead and layering limits its usefulness. An even more extreme solution is STT where non-TCP packets are spoofed as TCP to leverage NIC offloads. This patch set endeavours to address protocol ossification caused by techniques used in transmit checksum offload for NICs. Future work will address protocol ossification in the other primary NIC offloads-- namely receive checksum offload, LSO, LRO, and RSS. NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM: NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM exemplify the problem of protocol ossification. These features are relics from a simpler time in the Internet, before encapsulation, before GRE and IPIP. Many hardware vendors only saw the need to provide checksum offload for simple UDP and TCP packets over IPv4 (IPv6 support is an afterthought also). In today's Internet and data centers, checksum offload is well established as a valuable feature, but we can no longer afford to be contsrained to use a handful of protocols and features that are supported at the discretion of NIC vendors. Generic and protocol agnostic methods are needed. The actual interface that the stack uses with drivers for checksum offload is CHECKSUM_PARTIAL. This is a generic and protocol agnostic interface. A driver for a device that supports this generic interface advertises NETIF_F_HW_CSUM. Goals of this patch set: We propose that drivers advertise NETIF_F_HW_CSUM instead of protocol specific values of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM. If the driver's device is constrained (for instance it can only offlaod simple IPv4 and IPv6 packets) then these constraints can be checked in the transmit path and skb_checksum_help would be called for packets that the driver is unable to offload. In order to facilitate this, we add some helper functions that takes a specification argument indicating the type of packets a device is able to offload. If a packet does not match the specification, the helper function calls skb_checksum_help. Benefits of this approach are: - Simplify the stack and clarify the interface for checksum offload - Encourage NIC vendors to implement the generic. protocol agnostic checksum offload methods in hardware - Encourage feature parity in NIC offloads for IPv4 and IPv6 Many drivers advertise NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM and it probably isn't feasible to convert them all in a given time frame (although if we could this would be a great simplification to the stack). A reasonable direction may be to declare that new drivers must use NETIF_F_HW_CSUM as NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM are considered deprecated. There is a class of drivers that should now be converted to advertise NETIF_F_HW_CSUM, namely those that support offload of ecapsulated checksums. These drivers have to date been using skb->encapsulation to infer that checksum offload is being performed for an encapsulated checksum. This is strictly not correct. skb->encapsulation indicates that the inner headers are valid in the skbuff, whereas the stack indicates checksum offload arguments exclusively in csum_start and csum_offset. At some point we may want to set the inner headers for an skbuff but offload the outer transport checksum, so this needs to be fixed. In this patch set: - Rename some of constants involved in checksum offload to be more reflective of their function - Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM entirely as unnecessary convolutions - Fix conditions in tcp_sendpage and tcp_sendmsg to take IP protocol into account when determining if checksum offload can be done - Add driver helper functions for determining if a checksum can be offloaded to a device. If not, the helper function can call skb_checksum_help - Document the checksum offload interface between the stack and drivers with detail and specifics Testing: Have been testing ixgbe and mlx4. No noticeable regressions seen yet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: Elaborate on checksum offload interface description	Tom Herbert	2015-12-15	1	-29/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add specifics and details the description of the interface between the stack and drivers for doing checksum offload. This description is meant to be as specific and complete as possible. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>