op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	net/mlx4_en: Reduce memory consumption on kdump kernel	Amir Vadai	2014-07-22	1	-0/+1
\| \| \| \| \| \| \|	When memory is limited, reduce number of rx and tx rings. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Disable blueflame using ethtool private flags	Amir Vadai	2014-07-22	1	-0/+5
\| \| \| \| \| \| \| \| \|	Enable the user to turn off the hardware feature called BlueFlame. Since it is something specific to mlx4_en hardware, we control the feature via ethtool private flags. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2014-07-16	1	-0/+4
\|\ \| \| \| \| \| \|	Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Ignore budget on TX napi polling	Amir Vadai	2014-07-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is recommended that TX work not count against the quota. The cost of TX packet liberation is a minute percentage of what it costs to process an RX frame. Furthermore, that SKB freeing makes memory available for other paths in the stack. Give the TX a larger budget and be more aggressive about cleaning up the Tx descriptors this budget could be changed using ethtool: $ ethtool -C eth1 tx-frames-irq <budget> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ ↵	Amir Vadai	2014-07-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Fix mac_hash database inconsistency	Noa Osherovich	2014-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using a local copy of dev_addr in mlx4_en_set_mac() to prevent dev_addr from being modified during error flow or when dev_addr is modified in another context (which is another problem that is being discussed over the mailing list [1]). Also fixing bad naming of priv->prev_mac into priv->current_mac. [1] - http://patchwork.ozlabs.org/patch/351489/ Reviewed-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Do not count LLC/SNAP in MTU calculation	Yishai Hadas	2014-07-08	1	-2/+0
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	LLC/SNAP 8 bytes should not be added as part of header calculation. If used, payload will be decreased accordingly. For MTU of 1500 we'll set 1522 instead of 1523. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Liran Liss <liranl@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Use affinity hint	Yuval Atias	2014-06-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Revert "net/mlx4_en: Use affinity hint"	David S. Miller	2014-06-02	1	-1/+0
\| \| \| \| \| \|	This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61. Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Use affinity hint	Yuval Atias	2014-06-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mellanox: Logging message cleanups	Joe Perches	2014-05-08	1	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a more current logging style. o Coalesce formats o Add missing spaces for coalesced formats o Align arguments for modified formats o Add missing newlines for some logging messages o Use DRV_NAME as part of format instead of %s, DRV_NAME to reduce overall text. o Use ..., ##__VA_ARGS__ instead of args... in macros o Correct a few format typos o Use a single line message where appropriate Signed-off-by: Joe Perches <joe@perches.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll	Chris Mason	2014-04-16	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mlx4 driver is triggering schedules while atomic inside mlx4_en_netpoll: spin_lock_irqsave(&cq->lock, flags); napi_synchronize(&cq->napi); ^^^^^ msleep here mlx4_en_process_rx_cq(dev, cq, 0); spin_unlock_irqrestore(&cq->lock, flags); This was part of a patch by Alexander Guller from Mellanox in 2011, but it still isn't upstream. Signed-off-by: Chris Mason <clm@fb.com> cc: stable@vger.kernel.org Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: Set proper build dependancy with vxlan	Or Gerlitz	2014-04-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Make sure that vxlan_get_rx_port() is present in the kernel build in a manner consistent with mlx4, else mlx4 can be made built-in where vxlan a module and the phase of the build linking fails. Add CONFIG_MLX4_EN_VXLAN for that. Also, #ifdef the advertizement and implementation of the mlx4 vxlan ndo calls and related code under this config directive. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: Implement vxlan ndo calls	Or Gerlitz	2014-03-28	1	-0/+3
\| \| \| \| \| \| \| \|	Add implementation for the add/del vxlan port ndo calls, using the CONFIG_DEV firmware command. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2014-03-05	1	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net,IB/mlx: Bump all Mellanox driver versions	Amir Vadai	2014-02-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4: Replace mlx4_en_mac_to_u64() with mlx4_mac_to_u64()	Eugenia Emantayev	2014-03-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the EN driver uses a private static function mlx4_en_mac_to_u64(). Move it to a common include file (driver.h) for mlx4_en and mlx4_ib for further use. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Move queue stopped/waked counters to be per ring	Eugenia Emantayev	2014-03-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Give accurate counters and avoids cache misses when several rings update the counters of stop/wake queue. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Verify mlx4_en module parameters	Eugenia Emantayev	2014-03-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Verify mlx4_en module parameters. In case they are out of range - reset to default values. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4: Set number of RX rings in a utility function	Ido Shamay	2014-02-24	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \|	mlx4_en_add() is too long. Moving set number of RX rings to a utiltity function to improve readability and modulization of the code. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	netdevice: add queue selection fallback handler for ndo_select_queue	Daniel Borkmann	2014-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2014-01-14	1	-1/+2
\|\
\| *	net: core: explicitly select a txq before doing l2 forwarding	Jason Wang	2014-01-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: call gro handler for encapsulated frames	Eric Dumazet	2014-01-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to use the native GRO handling of encapsulated protocols on mlx4, we need to call napi_gro_receive() instead of netif_receive_skb() unless busy polling is in action. While we are at it, rename mlx4_en_cq_ll_polling() to mlx4_en_cq_busy_polling() Tested with GRE tunnel : GRO aggregation is now performed on the ethernet device instead of being done later on gre device. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Jerry Chu <hkchu@google.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4_en: Add PTP hardware clock	Shawn Bohrer	2014-01-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a PHC to the mlx4_en driver. We use reader/writer spinlocks to protect the timecounter since every packet received needs to call timecounter_cycle2time() when timestamping is enabled. This can become a performance bottleneck with RSS and multiple receive queues if normal spinlocks are used. This driver has been tested with both Documentation/ptp/testptp and the linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox ConnectX-3 card. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-By: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling	Or Gerlitz	2013-12-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the device tunneling offloads mode is vxlan do the following - call SET_PORT with the relevant setting - add DMFS steering vxlan rule for the device self and multicast mac addresses of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP - set relevant QPC fields in RSS context and RX ring QPs - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs which are marked for encapsulation such that the HW will segment them properly. - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Add NAPI support for transmit side	Eugenia Emantayev	2013-12-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add NAPI for TX side, implement its support and provide NAPI callback. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Configure the XPS queue mapping on driver load	Ido Shamay	2013-12-19	1	-1/+4
\|/ \| \| \| \| \| \| \| \| \|	Only TX rings of User Piority 0 are mapped. TX rings of other UP's are using UP 0 mapping. XPS is not in use when num_tc is set. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Datapath structures are allocated per NUMA node	Eugenia Emantayev	2013-11-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For each RX/TX ring and its CQ, allocation is done on a NUMA node that corresponds to the core that the data structure should operate on. The assumption is that the core number is reflected by the ring index. The affected allocations are the ring/CQ data structures, the TX/RX info and the shared HW/SW buffer. For TX rings, each core has rings of all UPs. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Datapath resources allocated dynamically	Eugenia Emantayev	2013-11-07	1	-13/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently all TX/RX rings and completion queues are part of the netdev priv structure and are allocated statically. This patch will change the priv to hold only arrays of pointers and therefore all TX/RX rings and completetion queues will be allocated dynamically. This is in preparation for NUMA aware allocations. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Rename name of mlx4_en_rx_alloc members	Amir Vadai	2013-10-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Add page prefix to page related members: @size and @offset into @page_size and @page_offset CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL	Cong Wang	2013-08-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	Eliezer renames several ll_poll to busy_poll, but forgets CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too. Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: allow order-0 memory allocations in RX path	Eric Dumazet	2013-06-25	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Eric Dumazet <edumazet@google.com> mlx4 exclusively uses order-2 allocations in RX path, which are likely to fail under memory pressure. We therefore drop frames more than needed. This patch tries order-3, order-2, order-1 and finally order-0 allocations to keep good performance, yet allow allocations if/when memory gets fragmented. By using larger pages, and avoiding unnecessary get_page()/put_page() on compound pages, this patch improves performance as well, lowering false sharing on struct page. Also use GFP_KERNEL allocations in initialization path, as allocating 12 MB (390 order-3 pages) can easily fail with GFP_ATOMIC. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Low Latency recv statistics	Amir Vadai	2013-06-19	1	-0/+6
\| \| \| \| \|	Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Add Low Latency Socket (LLS) support	Amir Vadai	2013-06-19	1	-0/+121
\| \| \| \| \| \| \| \|	Add basic support for LLS. Signed-off-by: Amir Vadai <amirv@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: use one page fragment per incoming frame	Eric Dumazet	2013-06-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mlx4 driver has a suboptimal memory allocation strategy for regular MTU=1500 frames, as it uses two page fragments : One of 512 bytes and one of 1024 bytes. This makes GRO less effective, as each GSO packet contains 8 MSS instead of 16 MSS. Performance of a single TCP flow gains 25 % increase with the following patch. Before patch : A:~# netperf -H 192.168.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 13798.47 3.06 4.20 0.436 0.598 After patch : A:~# netperf -H 192.68.0.2 -Cc MIGRATED TCP STREAM TEST ... Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 17273.80 3.44 4.19 0.391 0.477 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Add a service task	Amir Vadai	2013-04-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Add a service task to run tasks that needed to be executed periodically. Currently the only task is a watchdog to catch NIC clock overflow, to make timestamping accurate. Will move the statistics task into this framework in a later patch. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Add HW timestamping (TS) support	Amir Vadai	2013-04-24	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Enable DCB ETS ops only when supported by the firmware	Or Gerlitz	2013-04-07	1	-0/+1
\| \| \| \| \| \| \| \| \|	Enable the DCB ETS ops only when supported by the firmware. For older firmware/cards which don't support ETS, advertize only PFC DCB ops. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Fix race when setting the device MAC address	Yan Burman	2013-03-07	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove unnecessary use of workqueue for the device MAC address setting flow, and fix a race when setting MAC address which was introduced by commit c07cb4b0a "net/mlx4_en: Manage hash of MAC addresses per port" The race happened when mlx4_en_replace_mac was being executed in parallel with a successive call to ndo_set_mac_address, e.g witn an A/B/A MAC setting configuration test, the third set fails. With this change we also properly report an error if set MAC fails. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Add unicast MAC filtering	Yan Burman	2013-02-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement and advertise unicast MAC filtering, such that setting macvlan instance over mlx4_en interfaces will not require the networking core to put mlx4_en devices in promiscuous mode. If for some reason adding a unicast address filter fails e.g as of missing space in the HW mac table, the device forces itself into promiscuous mode (and out of this forced state when enough space is available). Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Manage hash of MAC addresses per port	Yan Burman	2013-02-07	1	-1/+6
\| \| \| \| \| \| \| \| \|	As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table. Remove the radix tree for MAC addresses per QP, as it's not in use. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Re-arrange ndo_set_rx_mode related code	Yan Burman	2013-02-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Currently, mlx4_en_do_set_multicast serves as the ndo_set_rx_mode entry for mlx4_en, doing all related work. Split it to few calls, one per required functionality (e.g multicast, promiscuous, etc) and rename some structures and calls to use rx_mode notation instead of multicast. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: Move Ethernet related functionality from mlx4_core to mlx4_en	Yan Burman	2013-02-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en. Also convert the new functions to deal with MACs in form of char array instead of u64. Actual functions moved: mlx4_replace_mac mlx4_get_eth_qp mlx4_put_eth_qp To conduct this change, some functionality had to be exported from the core, the following functions were added: mlx4_get_base_qp __mlx4_replace_mac (low level function for CX1/A0 compatibility) Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Optimize Rx fast path filter checks	Yan Burman	2013-02-07	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, RX path code that does RX filtering is not optimized and does an expensive conversion. In order to use ether_addr_equal_64bits which is optimized for such cases, we need the MAC address kept by the device to be in the form of unsigned char array instead of u64. Store the MAC address as unsigned char array and convert to/from u64 out of the fast path when needed. Side effect of this is that we no longer need priv->mac, since it's the same as dev->dev_addr. This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Optimize loopback related checks in data path	Yan Burman	2013-02-07	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there are relatively complex conditional checks in the fast path, for TX loopback enabling and resulting RX filter logic. Move elaborate if's out of data path, replace them with a single flag for each state and update that state from appropriate places. Also, in native (non SRIOV) mode and not in loopback or in selftest, there is no need to try and filter out packets that HW loopback-ed, as in native mode we do not loopback packets anymore. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Fix transmit timeout when driver restarts port	Amir Vadai	2013-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under heavy CPU load, changing, ring size/mtu/etc. could result in transmit timeout, since stop-start port might take more than 10 seconds. Calling netif_detach_device to prevent tx queue transmit timeout. netif_detach_device() is not called under ndo_stop, because netif_carrier_off will prevent the timeout, and device should not be marked as not present, or else user won't be able to start it later on. CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Fix ethtool rules leftovers after module unloaded	Hadar Hen Zion	2013-01-31	1	-0/+3
\| \| \| \| \| \| \| \| \|	As part of the driver unload flow, all steering rules must be deleted, make sure to remove the rules that were set through ethtool. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge tag 'rdma-for-linus' of ↵	Linus Torvalds	2012-12-13	1	-0/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband upate from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.8 merge window: - A good chunk of Bart Van Assche's SRP fixes - UAPI disintegration from David Howells - mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz - Other miscellaneous fixes" Fix up trivial conflict in mellanox/mlx4 driver. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (33 commits) RDMA/nes: Fix for crash when registering zero length MR for CQ RDMA/nes: Fix for terminate timer crash RDMA/nes: Fix for BUG_ON due to adding already-pending timer IB/srp: Allow SRP disconnect through sysfs srp_transport: Document sysfs attributes srp_transport: Simplify attribute initialization code srp_transport: Fix attribute registration IB/srp: Document sysfs attributes IB/srp: send disconnect request without waiting for CM timewait exit IB/srp: destroy and recreate QP and CQs when reconnecting IB/srp: Eliminate state SRP_TARGET_DEAD IB/srp: Introduce the helper function srp_remove_target() IB/srp: Suppress superfluous error messages IB/srp: Process all error completions IB/srp: Introduce srp_handle_qp_err() IB/srp: Simplify SCSI error handling IB/srp: Keep processing commands during host removal IB/srp: Eliminate state SRP_TARGET_CONNECTING IB/srp: Increase block layer timeout RDMA/cm: Change return value from find_gid_port() ...
\| *	mlx4: 64-byte CQE/EQE support	Or Gerlitz	2012-11-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when needed i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>