op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	bridge: netlink: export per-vlan stats	Nikolay Aleksandrov	2016-05-02	5	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \|	Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bridge: vlan: learn to count	Nikolay Aleksandrov	2016-05-02	5	-16/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: rtnetlink: add linkxstats callbacks and attribute	Nikolay Aleksandrov	2016-05-02	3	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add callbacks to calculate the size and fill link extended statistics which can be split into multiple messages and are dumped via the new rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute. Also add that attribute to the idx mask check since it is expected to be able to save state and resume dumping (e.g. future bridge per-vlan stats will be dumped via this attribute and callbacks). Each link type should nest its private attributes under the per-link type attribute. This allows to have any number of separated private attributes and to avoid one call to get the dev link type. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: rtnetlink: allow rtnl_fill_statsinfo to save private state counter	Nikolay Aleksandrov	2016-05-02	1	-13/+31
\| \| \| \| \| \| \| \| \| \| \|	The new prividx argument allows the current dumping device to save a private state counter which would enable it to continue dumping from where it left off. And the idxattr is used to save the current idx user so multiple prividx using attributes can be requested at the same time as suggested by Roopa Prabhu. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'ipv6-tunnel-cleanups'	David S. Miller	2016-05-02	6	-584/+452
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tom Herbert says: ==================== net: Cleanup IPv6 ip tunnels The IPv6 tunnel code is very different from IPv4 code. There is a lot of redundancy with the IPv4 code, particularly in the GRE tunneling. This patch set cleans up the tunnel code to make the IPv6 code look more like the IPv4 code and use common functions between the two stacks where possible. This work should make it easier to maintain and extend the IPv6 ip tunnels. Items in this patch set: - Cleanup IPv6 tunnel receive path (ip6_tnl_rcv). Includes using gro_cells and exporting ip6_tnl_rcv so the ip6_gre can call it - Move GRE functions to common header file (tx functions) or gre_demux.c (rx functions like gre_parse_header) - Call common GRE functions from IPv6 GRE - Create ip6_tnl_xmit (to be like ip_tunnel_xmit) Tested: Ran super_netperf tests for TCP_RR and TCP_STREAM for: - IPv4 over gre, gretap, gre6, gre6tap - IPv6 over gre, gretap, gre6, gre6tap - ipip - ip6ip6 - ipip/gue - IPv6 over gre/gue - IPv4 over gre/gue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre6: Cleanup GREv6 transmit path, call common GRE functions	Tom Herbert	2016-05-02	1	-202/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes in GREv6 transmit path: - Call gre_checksum, remove gre6_checksum - Rename ip6gre_xmit2 to __gre6_xmit - Call gre_build_header utility function - Call ip6_tnl_xmit common function - Call ip6_tnl_change_mtu, eliminate ip6gre_tunnel_change_mtu Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Generic tunnel cleanup	Tom Herbert	2016-05-02	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A few generic changes to generalize tunnels in IPv6: - Export ip6_tnl_change_mtu so that it can be called by ip6_gre - Add tun_hlen to ip6_tnl structure. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre: Create common functions for transmit	Tom Herbert	2016-05-02	2	-47/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create common functions for both IPv4 and IPv6 GRE in transmit. These are put into gre.h. Common functions are for: - GRE checksum calculation. Move gre_checksum to gre.h. - Building a GRE header. Move GRE build_header and rename gre_build_header. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Create ip6_tnl_xmit	Tom Herbert	2016-05-02	2	-17/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other users like GRE will be able to call this. The original ip6_tnl_xmit function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit function). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre6: Cleanup GREv6 receive path, call common GRE functions	Tom Herbert	2016-05-02	1	-117/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Create gre_rcv function. This calls gre_parse_header and ip6gre_rcv. - Call ip6_tnl_rcv. Doing this and using gre_parse_header eliminates most of the code in ip6gre_rcv. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	gre: Move utility functions to common headers	Tom Herbert	2016-05-02	3	-129/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several of the GRE functions defined in net/ipv4/ip_gre.c are usable for IPv6 GRE implementation (that is they are protocol agnostic). These include: - GRE flag handling functions are move to gre.h - GRE build_header is moved to gre.h and renamed gre_build_header - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header - iptunnel_pull_header is taken out of gre_parse_header. This is now done by caller. The header length is returned from gre_parse_header in an int* argument. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Cleanup IPv6 tunnel receive path	Tom Herbert	2016-05-02	2	-70/+146
\|/ \| \| \| \| \| \| \| \| \| \| \|	Some basic changes to make IPv6 tunnel receive path look more like IPv4 path: - Make ip6_tnl_rcv non-static so that GREv6 and others can call it - Make ip6_tnl_rcv look like ip_tunnel_rcv - Switch to gro_cells_receive - Make ip6_tnl_rcv non-static and export it Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'tcp-preempt'	David S. Miller	2016-05-02	20	-157/+150
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eric Dumazet says: ==================== net: make TCP preemptible Most of TCP stack assumed it was running from BH handler. This is great for most things, as TCP behavior is very sensitive to scheduling artifacts. However, the prequeue and backlog processing are problematic, as they need to be flushed with BH being blocked. To cope with modern needs, TCP sockets have big sk_rcvbuf values, in the order of 16 MB, and soon 32 MB. This means that backlog can hold thousands of packets, and things like TCP coalescing or collapsing on this amount of packets can lead to insane latency spikes, since BH are blocked for too long. It is time to make UDP/TCP stacks preemptible. Note that fast path still runs from BH handler. v2: Added "tcp: make tcp_sendmsg() aware of socket backlog" to reduce latency problems of large sends. v3: Fixed a typo in tcp_cdg.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: make tcp_sendmsg() aware of socket backlog	Eric Dumazet	2016-05-02	3	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. Critical decisions like fast retransmit might be delayed. Receivers have to maintain a big out of order queue with additional cpu overhead, and also possible stalls in TX once windows are full. Bidirectional flows are particularly hurt since the backlog can become quite big if the copy from user space triggers IO (page faults) Some applications learnt to use sendmsg() (or sendmmsg()) with small chunks to avoid this issue. Kernel should know better, right ? Add a generic sk_flush_backlog() helper and use it right before a new skb is allocated. Typically we put 64KB of payload per skb (unless MSG_EOR is requested) and checking socket backlog every 64KB gives good results. As a matter of fact, tests with TSO/GSO disabled give very nice results, as we manage to keep a small write queue and smaller perceived rtt. Note that sk_flush_backlog() maintains socket ownership, so is not equivalent to a {release_sock(sk); lock_sock(sk);}, to ensure implicit atomicity rules that sendmsg() was giving to (possibly buggy) applications. In this simple implementation, I chose to not call tcp_release_cb(), but we might consider this later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: do not block BH while processing socket backlog	Eric Dumazet	2016-05-02	1	-14/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Socket backlog processing is a major latency source. With current TCP socket sk_rcvbuf limits, I have sampled __release_sock() holding cpu for more than 5 ms, and packets being dropped by the NIC once ring buffer is filled. All users are now ready to be called from process context, we can unblock BH and let interrupts be serviced faster. cond_resched_softirq() could be removed, as it has no more user. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: prepare for socket backlog behavior change	Eric Dumazet	2016-05-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sctp_inq_push() will soon be called without BH being blocked when generic socket code flushes the socket backlog. It is very possible SCTP can be converted to not rely on BH, but this needs to be done by SCTP experts. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	udp: prepare for non BH masking at backlog processing	Eric Dumazet	2016-05-02	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UDP uses the generic socket backlog code, and this will soon be changed to not disable BH when protocol is called back. We need to use appropriate SNMP accessors. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	dccp: do not assume DCCP code is non preemptible	Eric Dumazet	2016-05-02	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DCCP uses the generic backlog code, and this will soon be changed to not disable BH when protocol is called back. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: do not block bh during prequeue processing	Eric Dumazet	2016-05-02	2	-32/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AFAIK, nothing in current TCP stack absolutely wants BH being disabled once socket is owned by a thread running in process context. As mentioned in my prior patch ("tcp: give prequeue mode some care"), processing a batch of packets might take time, better not block BH at all. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	tcp: do not assume TCP code is non preemptible	Eric Dumazet	2016-05-02	11	-99/+104
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to to make TCP stack preemptible, as draining prequeue and backlog queues can take lot of time. Many SNMP updates were assuming that BH (and preemption) was disabled. Need to convert some __NET_INC_STATS() calls to NET_INC_STATS() and some __TCP_INC_STATS() to TCP_INC_STATS() Before using this_cpu_ptr(net->ipv4.tcp_sk) in tcp_v4_send_reset() and tcp_v4_send_ack(), we add an explicit preempt disabled section. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'xgene-channel-number'	David S. Miller	2016-05-02	4	-1/+18
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Iyappan Subramanian says: ==================== drivers: net: xgene: fix: Get channel number from device binding This patch set adds 'channel' property to get ethernet to CPU channel number, thus decoupling the Linux driver from static resource selection. v2: Address review comments from v1 - removed irq reference from Linux driver - added 'channel' property to get ethernet to CPU channel number v1: - Initial version ==================== Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	dtb: xgene: Add channel property	Iyappan Subramanian	2016-05-02	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added 'channel' property, describing ethernet to CPU channel number. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Documentation: dtb: xgene: Add channel property	Iyappan Subramanian	2016-05-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	drivers: net: xgene: Get channel number from device binding	Iyappan Subramanian	2016-05-02	1	-1/+14
\|/ \| \| \| \| \| \| \| \|	This patch gets ethernet to CPU channel (prefetch buffer number) from the newly added 'channel' property, thus decoupling Linux driver from resource management. Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'qed-selftests'	David S. Miller	2016-05-02	13	-6/+598
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sudarsana Reddy Kalluru says: ==================== qed/qede: ethtool selftests support. This series adds the driver support for following selftests: 1. Register test 2. Memory test 3. Clock test 4. Interrupt test 5. Internal loopback test Patch (1) adds the qed driver infrastructure for selftests. Patches (2) and (3) add qede driver support for ethtool selftests. Please consider applying this series to "net-next". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qede: add implementation for internal loopback test.	Sudarsana Reddy Kalluru	2016-05-02	3	-4/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the qede implementation for internal loopback test. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qede: add support for selftests.	Sudarsana Reddy Kalluru	2016-05-02	1	-1/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the qede ethtool support for the following tests: - interrupt test - memory test - register test - clock test Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	qed: add infrastructure for device self tests.	Sudarsana Reddy Kalluru	2016-05-02	10	-1/+301
\|/ \| \| \| \| \| \| \| \| \| \|	This patch adds the functionality and APIs needed for selftests. It adds the ability to configure the link-mode which is required for the implementation of loopback tests. It adds the APIs for clock test, register test, interrupt test and memory test. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: mv88e6xxx: replace ds with ps where possible	Andrew Lunn	2016-05-02	6	-494/+511
\| \| \| \| \| \| \| \| \| \| \| \|	The dsa_switch structure ds is actually needed in very few places, mostly during setup of the switch. The private structure ps is however needed nearly everywhere. Pass ps, not ds internally. [vd: rebased Andrew's patch.] Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch '40GbE' of ↵	David S. Miller	2016-05-01	16	-392/+137
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-05-01 This series contains updates to i40e and i40evf. The theme of this series is code reduction, with several code cleanups in this series. Starting with Neerav's removal of the code that implemented the HMC AQ APIs and calls, since they are now obsolete and not supported by firmware. Anjali changes the default of VFs to make sure they are not trusted or privileged until its explicitly set for trust through the new NDO op interface. Also limited the number of MAC and VLAN addresses a VF can add if it is untrusted/privileged. Carolyn syncs the VF code for the changes made to the PF for the RSS hash tuple settings, which ends up cleaning up much of the existing code. Jesse cleans up compiler warnings which were found with gcc's W=2 option. Then removed duplicate code, especially since only one copy was actually being used. Jacob addresses an issue which was found when testing GCC 6's which happens to produce new warnings when you left shift a signed value beyond the storage sizeof the type. The converts i40e & i40evf to use the BIT() macro more consistently. Alex actually bucks the trend of code removal by adding support for both drivers to use GSO_PARTIAL so that segmentation of frames with checksums enabled in outer headers is supported. Fortunately it does not take much to add this support! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	i40e/i40evf: Add support for GSO partial with UDP_TUNNEL_CSUM and GRE_CSUM	Alexander Duyck	2016-05-01	4	-6/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes it so that i40e and i40evf can use GSO_PARTIAL to support segmentation for frames with checksums enabled in outer headers. As a result we can now send data over these types of tunnels at over 20Gb/s versus the 12Gb/s that was previously possible on my system. The advantage with the i40e parts is that this offload is mostly transparent as the hardware still deals with the inner and/or outer IPv4 headers so the IP ID is still incrementing for both when this offload is performed. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: make use of BIT() macro to avoid signed left shift	Jacob Keller	2016-05-01	1	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: make use of BIT() macro to prevent left shift of signed values	Jacob Keller	2016-05-01	1	-28/+25
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: fix I40E_MASK signed shift overflow warnings	Jacob Keller	2016-05-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC 6 has a new warning which will display when you attempt to left shift a signed value beyond the storage size of the type. I40E_MASK generates a mask value for 32bit registers. Properly typecast the mask value and place the values in parenthesis to prevent macro expansion issues. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf : Bump driver version from 1.5.5 to 1.5.10	Harshitha Ramamurthy	2016-05-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Update device ids for X722	Catherine Sullivan	2016-05-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a device ID for X722. Change-Id: I574f2345ab341de98a6a1c212d0603af853e48b0 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Drop extra copy of function	Jesse Brandeburg	2016-05-01	1	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i40e_release_rx_desc was in two files, but was only used and needed in txrx.c. Get rid of the extra copy. Change-Id: I86e18239aa03531fc198b6c052847475084a9200 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Use consistent type for vf_id	Jesse Brandeburg	2016-05-01	3	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The driver was all over the place using signed or unsigned types for vf_id, when it should always be signed. This fixes warnings of type unsafe comparisons from gcc with W=2. Change-Id: I2cb681f83d0f68ca124d2e4131e4ac0d9f8a6b22 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: PTP - avoid aggregate return warnings	Jesse Brandeburg	2016-05-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aggregate return warnings are when struct types are returned and must be copied to the lvalue with a struct copy by the compiler. This fixes warnings of type aggregate-return from gcc with W=2. Change-Id: I896b1bf514544bf0faeb458869d79914b9f1b168 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Fix uninitialized variable	Catherine Sullivan	2016-05-01	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have an uninitialized variable warning for valid_len for one case in validate_vf_mesg. To fix this, just initialize it to 0 at the top of the function and remove all of the now redundant assignments to 0 in the individual cases. Change-Id: Iacbd97f4c521ed8d662eef803a598d8707708cfd Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: RSS Hash Option parameters	Carolyn Wyborny	2016-05-01	1	-199/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch syncs the VF code for the changes made to the PF for the RSS hash tuple settings. Since the VF still cannot change the RSS hash settings, change the code to make this clear to the user. Previously, the default settings were returned in this function. However, the default can be changed by the PF so this does not make sense anymore. Change-Id: I085eaf005fc7978b440d2a1bf2b2dd7cadaff39b Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Remove HMC AQ API implementation	Neerav Parikh	2016-05-01	5	-88/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the code that implements the HMC AQ APIs and call these APIs. This is done because these are obsolete APIs and are not supported by firmware. Change-ID: I5d771d8f37c3e16e7b0a972ff9b27e75aa2d05d4 Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Prevent falling to promiscuous if the VF is not trusted	Anjali Singhai Jain	2016-05-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change a non trusted VF can never fall to promiscuous mode when there is no room for a MAC/VLAN filter. Change-Id: I8a155aa25c0bcdc6093414920c9ade4ee0bd20e8 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Limit the number of MAC and VLAN addresses that can be added for VFs	Anjali Singhai Jain	2016-05-01	2	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the VF is privileged/trusted it can do as it may please including but not limited to hogging resources and playing unfair. But if the VF is not privileged/trusted it still can add some number (8) of MAC and VLAN addresses. Other restrictions with respect to Port VLAN and normal VLAN still apply to not privileged/trusted VF. Change-Id: I3a9529201b184c8873e1ad2e300aff468c9e6296 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Change the default for VFs to be not privileged	Anjali Singhai Jain	2016-05-01	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure a VF is not trusted/privileged until its explicitly set for trust through the new NDO op interface. Change-Id: I476385c290d2b4901d8fceb29de43546accdc499 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* \|	sctp: signal sk_data_ready earlier on data chunks reception	Marcelo Ricardo Leitner	2016-05-01	3	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dave Miller pointed out that fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the receiving application is running on another CPU and that it would be better if we signalled as early as possible. This patch thus basically inverts the logic on fb586f25300f and signals it as early as possible, similar to what we had before. Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: Dave Miller <davem@davemloft.net> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mdio_bus: Fix MDIO bus scanning in __mdiobus_register()	Marek Vasut	2016-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit b74766a0a0fe ("phylib: don't return NULL from get_phy_device()") in linux-next, phy_get_device() will return ERR_PTR(-ENODEV) instead of NULL if the PHY device ID is all ones. This causes problem with stmmac driver and likely some other drivers which call mdiobus_register(). I triggered this bug on SoCFPGA MCVEVK board with linux-next 20160427 and 20160428. In case of the stmmac, if there is no PHY node specified in the DT for the stmmac block, the stmmac driver ( drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c function stmmac_mdio_register() ) will call mdiobus_register() , which will register the MDIO bus and probe for the PHY. The mdiobus_register() resp. __mdiobus_register() iterates over all of the addresses on the MDIO bus and calls mdiobus_scan() for each of them, which invokes get_phy_device(). Before the aforementioned patch, the mdiobus_scan() would return NULL if no PHY was found on a given address and mdiobus_register() would continue and try the next PHY address. Now, mdiobus_scan() returns ERR_PTR(-ENODEV), which is caught by the 'if (IS_ERR(phydev))' condition and the loop exits immediately if the PHY address does not contain PHY. Repair this by explicitly checking for the ERR_PTR(-ENODEV) and if this error comes around, continue with the next PHY address. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tipc: set 'active' state correctly for first established link	Jon Paul Maloy	2016-05-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are displaying statistics for the first link established between two peers, it will always be presented as STANDBY although it in reality is ACTIVE. This happens because we forget to set the 'active' flag in the link instance at the moment it is established. Although this is a bug, it only has impact on the presentation view of the link, not on its actual functionality. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	of: of_mdio: Check if MDIO bus controller is available	Florian Fainelli	2016-05-01	1	-0/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a check whether the 'struct device_node' pointer passed to of_mdiobus_register() is an available (aka enabled) node in the Device Tree. Rationale for doing this are cases where an Ethernet MAC provides a MDIO bus controller and node, and an additional Ethernet MAC might be connecting its PHY/switches to that first MDIO bus controller, while still embedding one internally which is therefore marked as "disabled". Instead of sprinkling checks like these in callers of of_mdiobus_register(), do this in a central location. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mlx5-aRFS'	David S. Miller	2016-04-29	15	-849/+1736
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Saeed Mahameed says: ==================== Mellanox 100G mlx5 ethernet aRFS support This series adds accelerated RFS support for the mlx5e driver. I have added one patch non-related to aRFS that fixes the rtnl_lock warning mlx5 driver been getting since b7aade15485a ('vxlan: break dependency with netdev drivers') aRFS support in details: A direct TIR per RQ is now required in order to have the essential building blocks for aRFS. Today the driver has one direct TIR that forwards traffic to RQ[0] (core 0), and one indirect TIR for RSS indirection table. For that we've added one direct TIR per RQ, e.g.: TIR[i] -> RQ[i] (core i). Publicize Modify flow rule destination and reveal it in flow steering API, to have the ability to dynamically modify the destination TIR(core) for aRFS rules from the ethernet driver. Initializing CPU reverse mapping to notify upper layer on internal receive queue cpu mappings. Some design refactoring for mlx5e ethernet driver flow tables and flow steering API. Now the caller of create_flow_table can choose the level of the flow table, this way we will create the mlx5e flow tables in a reversed order and connect them as we go, we create flow table[i+1] before flow table[i] to be able to set flow table[i + 1] as a destination of flow table[i] once flow table[i] is created. also we have split the main flow table in the following manner: - From before: RX packet had to visit two flow tables until it is delivered to its receive queue: RX packet -> vlan filter flow table -> main flow table. > vlan filter will check the packet vlan field is allowed. > main flow will check if the dest mac is allowed and will check the l3/l4 headers to retrieve the RSS hash for steering the packet into its final receive queue. - Now main flow table is split into l2 dst mac steering table and ttc (traffic type classifier) table: RX packet -> vlan filter -> l2 table -> ttc table > vlan filter - same as before > L2 filter - filter packets according their destination mac address > ttc table - classify packet headers for RSS steering - L3/L4 classification rules to steer the packet according to thier headers hash - in case of none of the rules applies the packet is steered to RQ[0] After the above refactoring all left to-do is to create aRFS flow table which will manage aRFS steering rules to forward traffic to the desired RQ (core) and just connect the ttc table rules destinations to aRFS flow table. aRFS flow table in case of a miss will deliver the traffic to the core where the original ttc hash would have chosen. TTC table is not initialized and enabled until the user explicitly asks to, i.e. setting the NETIF_F_NTUPLE to ON. This way there is no need for ttc table to forward traffic to aRFS table unless required. When setting back to OFF aRFS flow table is disabled and disconnected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>