op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	netfilter: built-in NAT support for DCCP	Davide Caratti	2016-12-04	5	-37/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CONFIG_NF_NAT_PROTO_DCCP is no more a tristate. When set to y, NAT support for DCCP protocol is built-in into nf_nat.ko. footprint test: (nf_nat_proto_) \| dccp \|\| nf_nat --------------------------+--------++-------- no builtin \| 409800 \|\| 2241312 DCCP builtin \| - \|\| 2578968 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	netfilter: update Arturo Borrero Gonzalez email address	Arturo Borrero Gonzalez	2016-12-04	6	-12/+12
\| \| \| \| \| \| \|	The email address has changed, let's update the copyright statements. Signed-off-by: Arturo Borrero Gonzalez <arturo@debian.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
*	ipv6 addrconf: Implemented enhanced DAD (RFC7527)	Erik Nordmark	2016-12-03	8	-6/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mv88e6390-batch-three'	David S. Miller	2016-12-03	8	-42/+225
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Lunn says: ==================== mv88e6390 batch 3 More patches to support the MV88e6390. This is mostly refactoring existing code and adding implementations for the mv88e6390. This patchset set which reserved frames are sent to the cpu, the size of jumbo frames that will be accepted, turn off egress rate limiting, and configuration of pause frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Implement mv88e6390 pause control	Andrew Lunn	2016-12-03	4	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mv88e6390 has a number flow control registers accessed via the Flow Control register. Use these to set the pause control. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Refactor pause configuration	Andrew Lunn	2016-12-03	4	-8/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mv88e6390 has a different mechanism for configuring pause. Refactor the code into an ops function, and for the moment, don't add any mv88e6390 code yet. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Refactor egress rate limiting	Andrew Lunn	2016-12-03	4	-12/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two different rate limiting configurations, depending on the switch generation. Refactor this into ops. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Refactor setting of jumbo frames	Andrew Lunn	2016-12-03	4	-5/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some switches support jumbo frames. Refactor this code into operations in the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Reserved Management frames to CPU	Andrew Lunn	2016-12-03	6	-18/+97
\|/ \| \| \| \| \| \| \| \|	Older devices have a couple of registers in global2. The mv88e6390 family has a single register in global1 behind which hides similar configuration. Implement and op for this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mv88e6390-batch-two'	David S. Miller	2016-12-03	6	-88/+594
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Lunn says: ==================== MV88E6390 batch two This is the second batch of patches adding support for the MV88e6390. They are not sufficient to make it work properly. The mv88e6390 has a much expanded set of priority maps. Refactor the existing code, and implement basic support for the new device. Similarly, the monitor control register has been reworked. The mv88e6390 has something odd in its EDSA tagging implementation, which means it is not possible to use it. So we need to use DSA tagging. This is the first device with EDSA support where we need to use DSA, and the code does not support this. So two patches refactor the existing code. The two different register definitions are separated out, and using DSA on an EDSA capable device is added. v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines Split monitor_ctrl into a number of separate ops. Remove 6390 code which is management, used in a later patch s/EGREES/EGRESS/. Broke up setup_port_dsa() and set_port_dsa() into a number of ops v3: Verify mandatory ops for port setup Don't set ether type for DSA port. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Refactor CPU and DSA port setup	Andrew Lunn	2016-12-03	4	-49/+319
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Older chips only support DSA tagging. Newer chips have both DSA and EDSA tagging. Refactor the code by adding port functions for setting the frame mode, egress mode, and if to forward unknown frames. This results in the helper mv88e6xxx_6065_family() becoming unused, so remove it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v3: Verify mandatory ops for port setup Don't set ether type for DSA port. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Move the tagging protocol into info	Andrew Lunn	2016-12-03	2	-19/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Older chips support a single tagging protocol, DSA. New chips support both DSA and EDSA, an enhanced version. Having both as an option changes the register layouts. Up until now, it has been assumed that if EDSA is supported, it will be used. Hence the register layout has been determined by which protocol should be used. However, mv88e6390 has a different implementation of EDSA, which requires we need to use the DSA tagging. Hence separate the selection of the protocol from the register layout. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Monitor and Management tables	Andrew Lunn	2016-12-03	4	-9/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mv88e6390 changes the monitor control register into the Monitor and Management control, which is an indirection register to various registers. Add ops to set the CPU port and the ingress/egress port for both register layouts, to global1 Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: dsa: mv88e6xxx: Implement mv88e6390 tag remap	Andrew Lunn	2016-12-03	4	-13/+101
\|/ \| \| \| \| \| \| \| \| \| \| \|	The mv88e6390 does not have the two registers to set the frame priority map. Instead it has an indirection registers for setting a number of different priority maps. Refactor the old code into an function, implement the mv88e6390 version, and use an op to call the right one. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'fib-notifier-event-replay'	David S. Miller	2016-12-03	11	-29/+342
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jiri Pirko says: ==================== ipv4: fib: Replay events when registering FIB notifier Ido says: In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced by a new FIB notification chain to which modules could register in order to be notified about the addition and deletion of FIB entries. The motivation for this change was that switchdev drivers need to be able to reflect the entire FIB table and not only FIBs configured on top of the port netdevs themselves. This is useful in case of in-band management. The fundamental problem with this approach is that upon registration listeners lose all the information previously sent in the chain and thus have an incomplete view of the FIB tables, which can result in packet loss. This patchset fixes that by dumping the FIB tables and replaying notifications previously sent in the chain for the registered notification block. The entire dump process is done under RCU and thus the FIB notification chain is converted to be atomic. The listeners are modified accordingly. This is done in the first eight patches. The ninth patch adds a change sequence counter to ensure the integrity of the FIB dump. The last patch adds the dump itself to the FIB chain registration function and modifies existing listeners to pass a callback to be executed in case dump was inconsistent. --- v3->v4: - Register the notification block after the dump and protect it using the change sequence counter (Hannes Frederic Sowa). - Since we now integrate the dump into the registration function, drop the sysctl to set maximum number of retries and instead set it to a fixed number. Lets see if it's really a problem before adding something we can never remove. - For the same reason, dump FIB tables for all net namespaces. - Add a comment regarding guarantees provided by mutex semantics. v2->v3: - Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa). - Read the sequence counter under RTNL to ensure synchronization between the dump process and other processes changing the routing tables (Hannes Frederic Sowa). - Pass a callback to the dump function to be executed prior to a retry. - Limit the dump to a single net namespace. v1->v2: - Add a sequence counter to ensure the integrity of the FIB dump (David S. Miller, Hannes Frederic Sowa). - Protect notifications from re-ordering in listeners by using an ordered workqueue (Hannes Frederic Sowa). - Introduce fib_info_hold() (Jiri Pirko). - Relieve rocker from the need to invoke the FIB dump by registering to the FIB notification chain prior to ports creation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fib: Replay events when registering FIB notifier	Ido Schimmel	2016-12-03	4	-5/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fib: Allow for consistent FIB dumping	Ido Schimmel	2016-12-03	3	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The next patch will enable listeners of the FIB notification chain to request a dump of the FIB tables. However, since RTNL isn't taken during the dump, it's possible for the FIB tables to change mid-dump, which will result in inconsistency between the listener's table and the kernel's. Allow listeners to know about changes that occurred mid-dump, by adding a change sequence counter to each net namespace. The counter is incremented just before a notification is sent in the FIB chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fib: Convert FIB notification chain to be atomic	Ido Schimmel	2016-12-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order not to hold RTNL for long periods of time we're going to dump the FIB tables using RCU. Convert the FIB notification chain to be atomic, as we can't block in RCU critical sections. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rocker: Register FIB notifier before creating ports	Ido Schimmel	2016-12-03	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can miss FIB notifications sent between the time the ports were created and the FIB notification block registered. Instead of receiving these notifications only when they are replayed for the FIB notification block during registration, just register the notification block before the ports are created. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rocker: Implement FIB offload in deferred work	Ido Schimmel	2016-12-03	2	-8/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert rocker to offload FIBs in deferred work in a similar fashion to mlxsw, which was converted in the previous commits. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rocker: Create an ordered workqueue for FIB offload	Ido Schimmel	2016-12-03	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As explained in the previous commits, we need to process FIB entries addition / deletion events in FIFO order or otherwise we can have a mismatch between the kernel's FIB table and the device's. Create an ordered workqueue for rocker to which these work items will be submitted to. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Implement FIB offload in deferred work	Ido Schimmel	2016-12-03	1	-10/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FIB offload is currently done in process context with RTNL held, but we're about to dump the FIB tables in RCU critical section, so we can no longer sleep. Instead, defer the operation to process context using deferred work. Make sure fib info isn't freed while the work is queued by taking a reference on it and releasing it after the operation is done. Deferring the operation is valid because the upper layers always assume the operation was successful. If it's not, then the driver-specific abort mechanism is called and all routed traffic is directed to slow path. The work items are submitted to an ordered workqueue to prevent a mismatch between the kernel's FIB table and the device's. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: core: Create an ordered workqueue for FIB offload	Ido Schimmel	2016-12-03	2	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're going to start processing FIB entries addition / deletion events in deferred work. These work items must be processed in the order they were submitted or otherwise we can have differences between the kernel's FIB table and the device's. Solve this by creating an ordered workqueue to which these work items will be submitted to. Note that we can't simply convert the current workqueue to be ordered, as EMADs re-transmissions are also processed in deferred work. Later on, we can migrate other work items to this workqueue, such as FDB notification processing and nexthop resolution, since they all take the same lock anyway. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fib: Add fib_info_hold() helper	Ido Schimmel	2016-12-03	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As explained in the previous commit, modules are going to need to take a reference on fib info and then drop it using fib_info_put(). Add the fib_info_hold() helper to make the code more readable and also symmetric with fib_info_put(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv4: fib: Export free_fib_info()	Ido Schimmel	2016-12-03	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FIB notification chain is going to be converted to an atomic chain, which means switchdev drivers will have to offload FIB entries in deferred work, as hardware operations entail sleeping. However, while the work is queued fib info might be freed, so a reference must be taken. To release the reference (and potentially free the fib info) fib_info_put() will be called, which in turn calls free_fib_info(). Export free_fib_info() so that modules will be able to invoke fib_info_put(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	act_mirred: fix a typo in get_dev	WANG Cong	2016-12-03	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fixes: 255cb30425c0 ("net/sched: act_mirred: Add new tc_action_ops get_dev()") Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch '40GbE' of ↵	David S. Miller	2016-12-03	9	-184/+511
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-12-02 This series contains updates to i40e and i40evf only. Alex provides changes so that we are much more robust about defining what we can and cannot offload in i40e and i40evf by doing additional checks other than L4 tunnel header length. Jake provides several fixes/changes, first cleaning up a label that is unnecessary, as well as cleaned up the use of a "magic number". Clarified the code by separating the global private flags and the regular private flags per interface into two arrays, so that future additions will not produce duplication and buggy code. Adds additional checks to protect against NULL values for msix_entries and q_vectors pointers. Michal adds Clause22 method for accessing registers for some external PHYs. Piotr adds additional protocol support for the admin queue discover capabilities function. Tushar Dave fixes a panic seen on SPARC, where writel() should not be used to write directly to a memory address but only to a memory mapped I/O address otherwise it causes data access exceptions. Joe Perches separates out a section of code into its own function, to help reduce i40evf_reset_task() a bit. Alan fixes an issue by checking for NULL before dereferencing msix_entries and returning early in the case where it is NULL within the i40evf_close() code path. Henry provides code cleanup to remove unreachable and redundant sections of code. Fixed up an issue where new NICs were not identifying "unknown PHYs" correctly. Harshitha fixes a issue where the ethtool "Supported Link" modes list backplane interfaces on X722 devices for 10 GbE with SFP+ and Cortina retimer, where these interfaces should not be visible to the user since they cannot use them. Carolyn changes an X722 informational message so that it only appears when extra messages are desired. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	i40e: change message to only appear when extra debug info is wanted	Carolyn Wyborny	2016-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes an X722 informational message so that it only appears when extra messages are desired. Without this patch, on X722 devices, this message appears at load, potentially causing unnecessary alarm. Change-ID: I94f7aae15dc5b2723cc9728c630c72538a3e670e Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e/i40evf: replace for memcpy with single memcpy call in ethtool	Jacob Keller	2016-12-02	1	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memcpy replaced with single memcpy call in ethtool. Change-ID: I3f5bef6bcc593412c56592c6459784db41575a0a Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: set broadcast promiscuous mode for each active VLAN	Jacob Keller	2016-12-02	4	-21/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A previous workaround added to ensure receipt of all broadcast frames incorrectly set the broadcast promiscuous mode unconditionally regardless of active VLAN status. Replace this partial workaround with a complete solution that sets the broadcast promiscuous filters in i40e_sync_vsi_filters. This new method sets the promiscuous mode based on when broadcast filters are added or removed. I40E_VLAN_ANY will request a broadcast filter for all VLANs, (as we're in untagged mode) while a broadcast filter on a specific VLAN will only request broadcast for that VLAN. Thus, we restore addition of broadcast filter to the array, but we add special handling for these such that they enable the broadcast promiscuous mode instead of being sent as regular filters. The end result is that we will correctly receive all broadcast packets (even those with a source address equal to the broadcast address) but will not receive packets for which we don't have an active VLAN filter. Change-ID: I7d0585c5cec1a5bf55bf533b42e5e817d5db6a2d Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Fix for ethtool Supported link modes	Harshitha Ramamurthy	2016-12-02	3	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the problem where the ethtool Supported link modes list backplane interfaces on X722 devices for 10GbE with SFP+ and Cortina retimer. This patch fixes the problem by setting and using a flag for this particular device since the backplane interface is only between the internal PHY and the retimer and it should not be seen by the user as they cannot use it. Without this patch, the user wrongly thinks that backplane interfaces are supported on their device when they actually are not. Change-ID: I3882bc2928431d48a2db03a51a713a1f681a79e9 Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: protect against NULL msix_entries and q_vectors pointers	Jacob Keller	2016-12-02	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the functions which free msix_entries and q_vectors so that they are safe against NULL values. This allows calling code to not care whether these have already been freed when disabling and freeing them. Change-ID: I31bfd1c0da18023d971b618edc6fb049721f3298 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Pass unknown PHY type for unknown PHYs	Henry Tieman	2016-12-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PHY type value for unrecognized PHYs and cables was changed based on firmware version number. Newer hardware use lower firmware version numbers and this was causing some PHYs to be identified as type 0x16 instead of 0xe (unknown). Without this patch, newer card will incorrectly identify unknown PHYs and cables. This change adds hardware type to the check for firmware version so the PHY type is reported correctly. Change-ID: I0723cbfd263c76fc73ff1a5275d1639051376c9a Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Remove unreachable code	Henry Tieman	2016-12-02	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code at the end of i40e_read_phy_register_clause22() contained unreachable code and redundant control statements. This change removes the unreachable code. And deletes the redundant goto statement and if statement. Change-ID: I713032b1585396f40f903cbcfdea987abd874400 Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: check for msix_entries null dereference	Alan Brady	2016-12-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for msix_entries to be freed by a previous suspend/remove before a VF is closed. This patch fixes the issue by checking for NULL before dereferencing msix_entries and returning early in the case where it is NULL within the i40evf_close code path. Without this patch it is possible to trigger a kernel panic through NULL dereference. Change-ID: I92a2746e82533a889e25f91578eac9abd0388ae2 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: Move some i40evf_reset_task code to separate function	Joe Perches	2016-12-02	1	-42/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The i40evf_reset_task function is a couple hundred lines and it has a separable block that disables VF. Move that block to a new i40evf_disable_vf function to shorten i40evf_reset_task a bit. Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: fix panic on SPARC while changing num of desc	Tushar Dave	2016-12-02	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On SPARC, writel() should not be used to write directly to memory address but only to memory mapped I/O address otherwise it causes data access exception. Commit 147e81ec75689 ("i40e: Test memory before ethtool alloc succeeds") introduced a code that uses memory address to fake the HW tail address and attempt to write to that address using writel() causes kernel panic on SPARC. The issue is reproduced while changing number of descriptors using ethtool. This change resolves the panic by using HW read-only memory mapped I/O register to fake HW tail address instead memory address. e.g. > ethtool -G eth2 tx 2048 rx 2048 i40e 0000:03:00.2 eth2: Changing Tx descriptor count from 512 to 2048. i40e 0000:03:00.2 eth2: Changing Rx descriptor count from 512 to 2048 sun4v_data_access_exception: ADDR[fff8001f9734a000] CTX[0000] TYPE[0004], going. \\|/ ____ \\|/ "@'/ .. \`@" /_\| \__/ \|_\ \__U_/ ethtool(3273): Dax [#1] CPU: 9 PID: 3273 Comm: ethtool Tainted: G E 4.8.0-linux-net_temp+ #7 task: fff8001f96d7a660 task.stack: fff8001f97348000 TSTATE: 0000009911001601 TPC: 00000000103189e4 TNPC: 00000000103189e8 Y: 00000000 Tainted: G E TPC: <i40e_alloc_rx_buffers+0x124/0x260 [i40e]> g0: fff8001f4eb64000 g1: 00000000000007ff g2: fff8001f9734b92c g3: 00203e0000000000 g4: fff8001f96d7a660 g5: fff8001fa6704000 g6: fff8001f97348000 g7: 0000000000000001 o0: 0006000046706928 o1: 00000000db3e2000 o2: fff8001f00000000 o3: 0000000000002000 o4: 0000000000002000 o5: 0000000000000001 sp: fff8001f9734afc1 ret_pc: 0000000010318a64 RPC: <i40e_alloc_rx_buffers+0x1a4/0x260 [i40e]> l0: fff8001f4e8bffe0 l1: fff8001f4e8cffe0 l2: 00000000000007ff l3: 00000000ff000000 l4: 0000000000ff0000 l5: 000000000000ff00 l6: 0000000000cda6a8 l7: 0000000000e822f0 i0: fff8001f96380000 i1: 0000000000000000 i2: 00203edb00000000 i3: 0006000046706928 i4: 0000000002086320 i5: 0000000000e82370 i6: fff8001f9734b071 i7: 00000000103062d4 I7: <i40e_set_ringparam+0x3b4/0x540 [i40e]> Call Trace: [00000000103062d4] i40e_set_ringparam+0x3b4/0x540 [i40e] [000000000094e2f8] dev_ethtool+0x898/0xbe0 [0000000000965570] dev_ioctl+0x250/0x300 [0000000000923800] sock_do_ioctl+0x40/0x60 [000000000092427c] sock_ioctl+0x7c/0x280 [00000000005ef040] vfs_ioctl+0x20/0x60 [00000000005ef5d4] do_vfs_ioctl+0x194/0x4c0 [00000000005ef974] SyS_ioctl+0x74/0xa0 [0000000000406214] linux_sparc_syscall+0x34/0x44 Disabling lock debugging due to kernel taint Caller[00000000103062d4]: i40e_set_ringparam+0x3b4/0x540 [i40e] Caller[000000000094e2f8]: dev_ethtool+0x898/0xbe0 Caller[0000000000965570]: dev_ioctl+0x250/0x300 Caller[0000000000923800]: sock_do_ioctl+0x40/0x60 Caller[000000000092427c]: sock_ioctl+0x7c/0x280 Caller[00000000005ef040]: vfs_ioctl+0x20/0x60 Caller[00000000005ef5d4]: do_vfs_ioctl+0x194/0x4c0 Caller[00000000005ef974]: SyS_ioctl+0x74/0xa0 Caller[0000000000406214]: linux_sparc_syscall+0x34/0x44 Caller[0000000000107154]: 0x107154 Instruction DUMP: e43620c8 e436204a c45e2038 <c2a083a0> 82102000 81cfe008 90086001 82102000 81cfe008 Kernel panic - not syncing: Fatal exception Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Add protocols over MCTP to i40e_aq_discover_capabilities	Piotr Raczynski	2016-12-02	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add logical_id to I40E_AQ_CAP_ID_MNG_MODE capability starting from major version 2. Change-ID: Idb29214b172ea5c70cbd45a99e6745c0215af7e4 Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: fix trivial typo in naming of i40e_sync_filters_subtask	Jacob Keller	2016-12-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A comment incorrectly referred to i40e_vsi_sync_filters_subtask which does not actually exist. Reference the correct function instead. Change-ID: I6bd805c605741ffb6fe34377259bb0d597edfafd Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Add Clause22 implementation	Michal Kosiarz	2016-12-02	3	-52/+159
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some external PHYs require Clause22 method for accessing registers. This patch also adds some defines to support blink led on devices using 10CBaseT PHY. Change-ID: I868a4326911900f6c89e7e522fda4968b0825f14 Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com> Signed-off-by: Matt Jared <matthew.a.jared@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: avoid duplicate private flags definitions	Jacob Keller	2016-12-02	1	-25/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Separate the global private flags and the regular private flags per interface into two arrays. Future additions of private flags will not need to be duplicated which may lead to buggy code. Also rename "i40e_priv_flags_strings_gl" to "i40e_gl_priv_flags_strings" for clarity, as it reads more naturally. Change-ID: I68caef3c9954eb7da342d7f9d20f2873186f2758 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: remove second check of VLAN_N_VID in i40e_vlan_rx_add_vid	Jacob Keller	2016-12-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace a check of magic number 4095 with VLAN_N_VID. This makes it obvious that a later check against VLAN_N_VID is always true and can be removed. Change-ID: I28998f127a61a529480ce63d8a07e266f6c63b7b Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: remove error_param_int label from i40e_vc_config_promiscuous_mode_msg	Jacob Keller	2016-12-02	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This label is unnecessary, as are jumping to a block that checks aq_ret and then immediately skipping it and returning. So just jump straight to the error_param and remove this unnecessary label. Also use goto error_param even in the last check for style consistency. Change-ID: If487c7d10c4048e37c594e5eca167693aaed45f6 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40evf: Be much more verbose about what we can and cannot offload	Alexander Duyck	2016-12-02	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change makes it so that we are much more robust about defining what we can and cannot offload. Previously we were performing no checks. This should bring us up to parity with the i40e PF driver. In addition the device only supports GSO as long as the MSS is 64 or greater. We were not checking this so an MSS less than that was resulting in Tx hangs. Change-ID: If533553ec92fc6ba694eab6ac81fdaf3004f3592 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
\| *	i40e: Be much more verbose about what we can and cannot offload	Alexander Duyck	2016-12-02	1	-8/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change makes it so that we are much more robust about defining what we can and cannot offload. Previously we were just checking for the L4 tunnel header length, however there are other fields we should be verifying as there are multiple scenarios in which we cannot perform hardware offloads. In addition the device only supports GSO as long as the MSS is 64 or greater. We were not checking this so an MSS less than that was resulting in Tx hangs. Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
* \|	tcp: fix the missing avr32 SOF_TIMESTAMPING_OPT_STATS	Yuchung Cheng	2016-12-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit of SOF_TIMESTAMPING_OPT_STATS didn't include the new header for avr32, causing build to break. The patch fixes it. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	udp: be less conservative with sock rmem accounting	Paolo Abeni	2016-12-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before commit 850cbaddb52d ("udp: use it's own memory accounting schema"), the udp protocol allowed sk_rmem_alloc to grow beyond the rcvbuf by the whole current packet's truesize. After said commit we allow sk_rmem_alloc to exceed the rcvbuf only if the receive queue is empty. As reported by Jesper this cause a performance regression for some (small) values of rcvbuf. This commit is intended to fix the regression restoring the old handling of the rcvbuf limit. Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net_sched: gen_estimator: account for timer drifts	Eric Dumazet	2016-12-03	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under heavy stress, timer used in estimators tend to slowly be delayed by a few jiffies, leading to inaccuracies. Lets remember what was the last scheduled jiffies so that we get more precise estimations, without having to add a multiply/divide in the loop to account for the drifts. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	sfc: remove EFX_BUG_ON_PARANOID, use EFX_WARN_ON_[ONCE_]PARANOID instead	Edward Cree	2016-12-03	13	-56/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should only be used if it is not possible to continue without potential harm; and since the non-DEBUG driver will continue regardless (as the BUG_ON is compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver. So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID or the newly defined EFX_WARN_ON_ONCE_PARANOID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'samples-bpf-automated-cgroup-tests'	David S. Miller	2016-12-03	5	-85/+352
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sargun Dhillon says: ==================== samples, bpf: Refactor; Add automated tests for cgroups These two patches are around refactoring out some old, reusable code from the existing test_current_task_under_cgroup_user test, and adding a new, automated test. There is some generic cgroupsv2 setup & cleanup code, given that most environment still don't have it setup by default. With this code, we're able to pretty easily add an automated test for future cgroupsv2 functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>