summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
Commit message (Collapse)AuthorAgeFilesLines
* net: mlx4: slight optimization of addr comparedingtianhong2013-12-312-3/+3
| | | | | | | | | | Use possibly more efficient ether_addr_equal to instead of memcmp. Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-By: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunnelingOr Gerlitz2013-12-315-2/+129
| | | | | | | | | | | | | | | | | | | | | When the device tunneling offloads mode is vxlan do the following - call SET_PORT with the relevant setting - add DMFS steering vxlan rule for the device self and multicast mac addresses of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP - set relevant QPC fields in RSS context and RX ring QPs - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs which are marked for encapsulation such that the HW will segment them properly. - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Add basic support for TCP/IP offloads under tunnelingOr Gerlitz2013-12-314-3/+85
| | | | | | | | | | | | | | | Add the low-level device commands and definitions used for TCP/IP HW offloads of tunneled/vxlan traffic which are supported by the ConnectX3-pro NIC. This is done through the following elements: - read tunneling device caps in QUERY_DEV_CAP - add helper function to do SET_PORT for tunneling - add DMFS VXLAN steering rule definitions - add CQE and WQE checksum offload field definitions Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Check port number for validity before accessing dataMatan Barak2013-12-191-1/+27
| | | | | | | | | | Need to validate port number at mlx4_promisc_qp() before use. Since port number is extracted from gid, as a cooked or corrupted gid could lead to a crash. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Add NAPI support for transmit sideEugenia Emantayev2013-12-193-11/+41
| | | | | | | | | | Add NAPI for TX side, implement its support and provide NAPI callback. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Ignore irrelevant hypervisor eventsEugenia Emantayev2013-12-191-0/+3
| | | | | | | | | | | | MLX4_DEV_EVENT_SLAVE_INIT and MLX4_DEV_EVENT_SLAVE_SHUTDOWN events used by Hypervisor to inform the PPF IB driver that IB para-virtualization must be initialized/destroyed for a slave. If this event is catched by ETH VF annoying but harmless error message is printed into dmesg. Remove dmesg prints for these events. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Set CQE/EQE size to 64B by defaultEyal Perry2013-12-191-2/+2
| | | | | | | | | | | To achieve out of the box performance default is to use 64 byte CQE/EQE. In tests that we conduct in our labs, we achieved a performance improvement of twice the message rate. For older VF/libmlx4 support, enable_64b_cqe_eqe must be set to 0 (disabled). Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Configure the XPS queue mapping on driver loadIdo Shamay2013-12-193-4/+16
| | | | | | | | | | Only TX rings of User Piority 0 are mapped. TX rings of other UP's are using UP 0 mapping. XPS is not in use when num_tc is set. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_en: Implement ndo_get_phys_port_idHadar Hen Zion2013-12-191-0/+23
| | | | | | | | | | Use the port GUID read from the firmware to identify the physical port. This port identifier is available via ndo_get_phys_port_id for both PF and VF net-devices. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Expose physical port id as PF/VF capabilityHadar Hen Zion2013-12-193-0/+51
| | | | | | | | | | | | | | Add the infrastructure needed to support ndo_get_phys_port_id which allows users to identify to which physical port a net-device is connected to by reading a unique port id. This will work for VFs and PFs. The driver uses a new device capability - phys_port_id, The PF driver reads the port phys_port_id from Firmware and stores it. The VF driver reads the port phys_port_id from the PF using QUERY_FUNC_CAP command. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Introduce nic_info new flag in QUERY_FUNC_CAPHadar Hen Zion2013-12-192-3/+9
| | | | | | | | | | | Set nic_info field in QUERY_FUNC_CAP, which designates supplementary NIC information is provided by the hypervisor. When set, the following fields are valid: nic_num_rings, nic_indirection_tbl_sz, cur_mac and phys_port_id. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Rename QUERY_FUNC_CAP fieldsHadar Hen Zion2013-12-191-10/+10
| | | | | | | | Use correct names for QUERY_FUNC_CAP fields: flags0 and flags1. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx4_core: Remove zeroed out of explicit QUERY_FUNC_CAP fieldsHadar Hen Zion2013-12-191-6/+0
| | | | | | | | | | | All mailboxes are already zeroed by commit: 571b8b9 net/mlx4_core: Initialize all mailbox buffers to zero before use Remove explicit zero set for force mac and force vlan fields in mlx4_QUERY_FUNC_CAP_wrapper Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: mlx4 calls skb_set_hashTom Herbert2013-12-181-2/+6
| | | | | | | | Drivers should call skb_set_hash to set the hash and its type in an skbuff. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and ↵Jack Morgenstein2013-12-0910-20/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MPTs Commit f4ec9e9 "mlx4_core: Change bitmap allocator to work in round-robin fashion" introduced round-robin allocation (via bitmap) for all resources which allocate via a bitmap. Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds. These are simply numbers, with no involvement of ICM memory mapping. Round robin is required for QPs, since we had a problem with immediate reuse of a 24-bit QP number (commit f4ec9e9). However, for other resources which use the bitmap allocator and involve mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable. What happens in these cases is the following: ICM memory is allocated and mapped in chunks of 256K. Since the resource allocation index goes up monotonically, the allocator will eventually require mapping a new chunk. Now, chunks are also unmapped when their reference count goes back to zero. Thus, if a single app is running and starts/exits frequently we will have the following situation: When the app starts, a new chunk must be allocated and mapped. When the app exits, the chunk reference count goes back to zero, and the chunk is unmapped and freed. Therefore, the app must pay the cost of allocation and mapping of ICM memory each time it runs (although the price is paid only when allocating the initial entry in the new chunk). For apps which allocate MPTs/SRQs/CQs and which operate as described above, this presented a performance problem. We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs. Reported-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'siocghwtstamp' of ↵David S. Miller2013-12-051-2/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next Ben Hutchings says: ==================== SIOCGHWTSTAMP ioctl 1. Add the SIOCGHWTSTAMP ioctl and update the timestamping documentation. 2. Implement SIOCGHWTSTAMP in most drivers that support SIOCSHWTSTAMP. 3. Add a test program to exercise SIOC{G,S}HWTSTAMP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlx4_en: Implement the SIOCGHWTSTAMP ioctlBen Hutchings2013-11-211-2/+12
| | | | | | | | | | | | Compile-tested only. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
* | net/mlx4_core: destroy workqueue when driver fails to registerWei Yang2013-12-031-0/+2
| | | | | | | | | | | | | | | | | | | | When driver registration fails, we need to clean up the resources allocated before. mlx4_core missed destroying the workqueue allocated. This patch destroys the workqueue when registration fails. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4_en: Remove selftest TX queues empty conditionEugenia Emantayev2013-12-011-7/+0
| | | | | | | | | | | | | | | | | | | | Remove waiting for TX queues to become empty during selftest. This check is not necessary for any purpose, and might put the driver into an infinite loop. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-11-186-104/+275
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...
| * | IB/mlx5: Fix list_del of empty listEli Cohen2013-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | For archs with pages size of 4K, when the chunk is freed, fwp is not in the list so avoid attempting to delete it. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | mlx5: Use enum to indicate adapter page sizeEli Cohen2013-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The Connect-IB adapter has an inherent page size which equals 4K. Define an new enum that equals the page shift and use it instead of using the value 12 throughout the code. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | mlx5_core: Change optimal_reclaimed_pages for better performanceMoshe Lazer2013-11-081-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change optimal_reclaimed_pages() to increase the output size of each reclaim pages command. This change reduces significantly the amount of reclaim pages commands issued to FW when the driver is unloaded which reduces the overall driver unload time. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | mlx5: Clear reserved area in set_hca_cap()Eli Cohen2013-11-081-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Firmware spec requires reserved fields to be cleared when calling set_hca_cap. Current code queries and copy to the set area, possibly resulting in reserved bits not cleared. This patch copies only writable fields to the set area. Fix also typo - msx => max Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | mlx5: Support communicating arbitrary host page size to firmwareEli Cohen2013-11-082-55/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connect-IB firmware requires 4K pages to be communicated with the driver. This patch breaks larger pages to 4K units to enable support for architectures utilizing larger page size, such as PowerPC. This patch also fixes several places that referred to PAGE_SHIFT instead of explicit 12 which is the inherent page shift on Connect-IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | mlx5: Fix cleanup flow when DMA mapping failsEli Cohen2013-11-081-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If DMA mapping fails, the driver cleared the object that holds the previously DMA mapped pages. Fix this by allocating a new object for the command that reports back to firmware that pages can't be supplied. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | IB/mlx5: Multithreaded create MREli Cohen2013-11-083-42/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use asynchronous commands to execute up to eight concurrent create MR commands. This is to fill memory caches faster so we keep consuming from there. Also, increase timeout for shrinking caches to five minutes. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-151-1/+1
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
| * | treewide: Fix typo in printkMasanari Iida2013-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Correct spelling typo within various part of the kernel Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | net/mlx4_en: Datapath structures are allocated per NUMA nodeEugenia Emantayev2013-11-076-33/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For each RX/TX ring and its CQ, allocation is done on a NUMA node that corresponds to the core that the data structure should operate on. The assumption is that the core number is reflected by the ring index. The affected allocations are the ring/CQ data structures, the TX/RX info and the shared HW/SW buffer. For TX rings, each core has rings of all UPs. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: ICM pages are allocated on device NUMA nodeEugenia Emantayev2013-11-072-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is done to optimize FW/HW access to host memory. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_en: Datapath resources allocated dynamicallyEugenia Emantayev2013-11-078-111/+178
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all TX/RX rings and completion queues are part of the netdev priv structure and are allocated statically. This patch will change the priv to hold only arrays of pointers and therefore all TX/RX rings and completetion queues will be allocated dynamically. This is in preparation for NUMA aware allocations. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Add immediate activate for VGT->VST->VGTRony Efraim2013-11-072-32/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow immediate activate of VGT->VST and VST->VGT transitions, without the need of rebinding in mlx4_master_immediate_activate_vlan_qos(). Also in struct res_qp: add qp parameters (vlan_index,fvl,vlan_cntrol..) to the saved set, in order to restore when move to VGT. - Clear at mlx4_RST2INIT_QP_wrapper() - Save at mlx4_INIT2RTR_QP_wrapper() - Restore at mlx4_vf_immed_vlan_work_handler() Update mlx4_vf_immed_vlan_work_handler() to support VGT. Signed-off-by: Rony Efraim <ronye@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Initialize all mailbox buffers to zero before useJack Morgenstein2013-11-079-37/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_en: Add RFS support in UDPEyal Perry2013-11-071-11/+34
| | | | | | | | | | | | | | | | | | | | | | | | Modify RFS code to support applying filters for incoming UDP streams. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_en: Fixed crash when port type is changedAmir Vadai2013-11-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | timecounter_init() was was called only after first potential timecounter_read(). Moved mlx4_en_init_timestamp() before mlx4_en_init_netdev() Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Implement resource quota enforcementJack Morgenstein2013-11-042-13/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements resource quota grant decision when resources are requested, for the following resources: QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs, and Counters. When granting a resource, the quota system increases the allocated-count for that slave. When the slave later frees the resource, its allocated-count is reduced. A spinlock is used to protect the integrity of each resource's free-pool counter. (One slave may be in the process of being granted a resource while another slave has crashed, initiating cleanup of that slave's resource quotas). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapperJack Morgenstein2013-11-041-23/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current kernels, the mlx4 driver running on a VM does not differentiate between max resource numbers for the HCA and max quotas -- it simply takes the quota values passed to it as max-resource values. However, the driver actually requires the VFs to be aware of the actual number of resources that the HCA was initialized with, for QPs, CQs, SRQs and MPTs. For QPs, CQs and SRQs, the reason is that in completion handling the driver must know which of the 24 bits are the actual resource number, and which are "padding" bits. For MPTs, also, the driver assumes knowledge of the number of MPTs in the system. The previous commit fixes the quota logic on the VM for the quota values passed to it by QUERY_FUNC_CAPS. For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers from QUERY_HCA (and not QUERY_FUNC_CAPS). The quotas passed in QUERY_FUNC_CAPS are used to report max resource number values in the response to ib_query_device. However, the Hypervisor driver must consider that VMs may be running previous kernels, and compatibility must be preserved. To resolve the incompatibility with previous kernels running on VMs, we deprecated the quota fields in mlx4_QUERY_FUNC_CAP. In the deprecated fields, we pass the max-resource values from INIT_HCA The quota fields are moved to a new location, and the current kernel driver takes the proper values from that location. There is also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox; if this flag is set, the (VM) driver takes the quota values from the new location. VMs running previous kernels will work properly, except that the max resource numbers reported in ib_query_device for these resources will be too high. The Hypervisor driver will, however, enforce the quotas for these VMs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | mlx4: Structures and init/teardown for VF resource quotasJack Morgenstein2013-11-045-19/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Fix checking order in MR table initJack Morgenstein2013-11-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In procedure mlx4_init_mr_table(), slaves should do no processing, but should return success. This initialization is hypervisor-only. However, the check for num_mpts being a power-of-2 was performed before the check to return immediately if the driver is for a slave. This resulted in spurious failures. The order of performing the checks is reversed, so that if the driver is for a slave, no processing is done and success is returned. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Don't fail reg/unreg vlan for older guestsJack Morgenstein2013-11-043-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In upstream kernels under SRIOV, the vlan register/unregister calls were NOPs (doing nothing and returning OK). We detect these old calls from guests (via the comm channel), since previously the port number in mlx4_register_vlan was passed (improperly) in the out_param. This has been corrected so that the port number is now passed in bits 8..15 of the in_modifier field. For old calls, these bits will be zero, so if the passed port number is zero, we can still look at the out_param field to see if it contains a valid port number. If yes, the VM is running an old driver. Since for old drivers, the register/unregister_vlan wrappers were NOPs, we continue this policy -- the reason being that upstream had an additional bug in eth driver running on guests (where procedure mlx4_en_vlan_rx_kill_vid() had the following code: if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx)) mlx4_unregister_vlan(mdev->dev, priv->port, idx); else en_err(priv, "could not find vid %d in cache\n", vid); On a VM, mlx4_find_cached_vlan() will always fail, since the vlan cache is located on the Hypervisor; on guests it is empty. Therefore, if we allow upstream guests to register vlans, we will have vlan leakage since the unregister will never be performed. Leaving vlan reg/unreg for old guest drivers as a NOP is not a feature regression, since in upstream the register/unregister vlan wrapper is a NOP. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Resource tracker for reg/unreg vlansJack Morgenstein2013-11-041-6/+121
| | | | | | | | | | | | | | | | | | | | | | | | Add resource tracker support for reg/unreg vlans calls done by VFs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_en: Use vlan id instead of vlan index for unregistrationJack Morgenstein2013-11-045-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use of vlan_index created problems unregistering vlans on guests. In addition, tools delete vlan by tag, not by index, lets follow that. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Fix reg/unreg vlan/mac to conform to the firmware specJack Morgenstein2013-11-042-25/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions mlx4_register_vlan, mlx4_unregister_vlan, mlx4_register_mac, mlx4_unregister_mac all made illegal use of the out_param in multifunc mode to pass the port number. The firmware spec specifies that the port number should be passed in bits 8..15 of the input-modifier field for ALLOC_RES and FREE_RES (sections 20.15.1 and 20.15.2). For MAC register/unregister, this patch contains workarounds so that guests running previous kernels continue to work on a new Hypervisor, and guests running the new kernel will continue to work on old hypervisors. Vlan registeration capability is still not operational in multifunction mode, since the vlan wrapper functions are not implemented in this patch. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net/mlx4_core: Fix register/unreg vlan flowJack Morgenstein2013-11-041-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reg/unreg vlan code was broken: 1. a wrapped function called another wrapped function, causing a deadlock. 2. unregister_vlan called cmd_box instead of cmd_box_imm, leading to incorrectly passed parameters. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-11-041-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/emulex/benet/be.h drivers/net/netconsole.c net/bridge/br_private.h Three mostly trivial conflicts. The net/bridge/br_private.h conflict was a function signature (argument addition) change overlapping with the extern removals from Joe Perches. In drivers/net/netconsole.c we had one change adjusting a printk message whilst another changed "printk(KERN_INFO" into "pr_info(". Lastly, the emulex change was a new inline function addition overlapping with Joe Perches's extern removals. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/mlx4_core: Fix call to __mlx4_unregister_macJack Morgenstein2013-11-041-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | In function mlx4_master_deactivate_admin_state() __mlx4_unregister_mac was called using the MAC index. It should be called with the value of the MAC itself. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-10-234-32/+37
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-10-232-19/+26
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Sorry I let so much accumulate, I was in Buffalo and wanted a few things to cook in my tree for a while before sending to you. Anyways, it's a lot of little things as usual at this stage in the game" 1) Make bonding MAINTAINERS entry reflect reality, from Andy Gospodarek. 2) Fix accidental sock_put() on timewait mini sockets, from Eric Dumazet. 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6 addresses, from François CACHEREUL. 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan Carpenter. 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric Dumazet. 6) SFC driver bug fixes from Ben Hutchings. 7) Fix TX packet scheduling wedge after channel change in ath9k driver, from Felix Fietkau. 8) Fix user after free in BPF JIT code, from Alexei Starovoitov. 9) Source address selection test is reversed in __ip_route_output_key(), fix from Jiri Benc. 10) VLAN and CAN layer mis-size netlink attributes, from Marc Kleine-Budde. 11) Fix permission checks in sysctls to use current_euid() instead of current_uid(). From Eric W Biederman. 12) IPSEC policies can go away while a timer is still pending for them, add appropriate ref-counting to fix, from Steffen Klassert. 13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth chips, from Nguyen Hong Ky and Simon Horman. 14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai. 15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama Ghorbel. 16) Fix typo in fq_change(), we were assigning "initial quantum" to "quantum". From Eric Dumazet. 17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise FQ packet scheduler does not pace those flows properly. Also from Eric Dumazet. 18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland. 19) l2tp_xmit_skb() can be called from process context, not just softirq context, so we must always make sure to BH disable around it. From Eric Dumazet. 20) On qdisc reset, we forget to purge the RB tree of SKBs in netem packet scheduler. From Stephen Hemminger. 21) Fix info leak in farsync WAN driver ioctl() handler, from Dan Carpenter and Salva Peiró. 22) Fix PHY reset and other issues in dm9000 driver, from Nikita Kiryanov and Michael Abbott. 23) When hardware can do SCTP crc32 checksums, we accidently don't disable the csum offload when IPSEC transformations have been applied. From Fan Du and Vlad Yasevich. 24) Tail loss probing in TCP leaves the socket in the wrong congestion avoidance state. From Yuchung Cheng. 25) In CPSW driver, enable NAPI before interrupts are turned on, from Markus Pargmann. 26) Integer underflow and dual-assignment in YAM hamradio driver, from Dan Carpenter. 27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must unclone it. This fixes various hard to track down crashes in drivers where the SKBs ->gso_segs was changing right from underneath the driver during TX queueing. From Eric Dumazet. 28) Fix the handling of VLAN IDs, and in particular the special IDs 0 and 4095, in the bridging layer. From Toshiaki Makita. 29) Another info leak, this time in wanxl WAN driver, from Salva Peiró. 30) Fix race in socket credential passing, from Daniel Borkmann. 31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly, from Seif Mazareeb. 32) Fix identification of fragmented frames in ipv4/ipv6 UDP Fragmentation Offload output paths, from Jiri Pirko. 33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel Elior. 34) When we removed the explicit neighbour pointer from ipv6 routes a slight regression was introduced for users such as IPVS, xt_TEE, and raw sockets. We mix up the users requested destination address with the routes assigned nexthop/gateway. From Julian Anastasov and Simon Horman. 35) Fix stack overruns in rt6_probe(), the issue is that can end up doing two full packet xmit paths at the same time when emitting neighbour discovery messages. From Hannes Frederic Sowa. 36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from Mariusz Ceier. 37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT sample, from Neal Cardwell. 38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from Steffen Klassert. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits) ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter ax88179_178a: Correct the RX error definition in RX header Revert "bridge: only expire the mdb entry when query is received" tcp: initialize passive-side sk_pacing_rate after 3WHS davinci_emac.c: Fix IFF_ALLMULTI setup mac802154: correct a typo in ieee802154_alloc_device() prototype ipv6: probe routes asynchronous in rt6_probe netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper ipv6: fill rt6i_gateway with nexthop address ipv6: always prefer rt6i_gateway if present bnx2x: Set NETIF_F_HIGHDMA unconditionally bnx2x: Don't pretend during register dump bnx2x: Lock DMAE when used by statistic flow bnx2x: Prevent null pointer dereference on error flow bnx2x: Fix config when SR-IOV and iSCSI are enabled bnx2x: Fix Coalescing configuration bnx2x: Unlock VF-PF channel on MAC/VLAN config error bnx2x: Prevent an illegal pointer dereference during panic bnx2x: Fix Maximum CoS estimation for VFs drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing ...
| * | | IB/mlx5: Fix eq names to display nicely in /proc/interruptsSagi Grimberg2013-10-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's helpful for a driver to put the pci slot name in its interrupt names, so /proc/interrupts will show the pci slot of the device. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
OpenPOWER on IntegriCloud