summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw
Commit message (Collapse)AuthorAgeFilesLines
* IB/hfi1: Invalid NUMA node information can cause a divide by zeroMichael J. Ruhl2018-08-201-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the system BIOS does not supply NUMA node information to the PCI devices, the NUMA node is selected by choosing the current node. This can lead to the following crash: divide error: 0000 SMP CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE ------------ 3.10.0-693.21.1.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 Workqueue: events work_for_cpu_fn task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000 RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1] RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246 RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000 RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002 R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0 R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012 FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: hfi1_init_dd+0x14b3/0x27a0 [hfi1] ? pcie_capability_write_word+0x46/0x70 ? hfi1_pcie_init+0xc0/0x200 [hfi1] do_init_one+0x153/0x4c0 [hfi1] ? sched_clock_cpu+0x85/0xc0 init_one+0x1b5/0x260 [hfi1] local_pci_probe+0x4a/0xb0 work_for_cpu_fn+0x1a/0x30 process_one_work+0x17f/0x440 worker_thread+0x278/0x3c0 ? manage_workers.isra.24+0x2a0/0x2a0 kthread+0xd1/0xe0 ? insert_kthread_work+0x40/0x40 ret_from_fork+0x77/0xb0 ? insert_kthread_work+0x40/0x40 If the BIOS is not supplying NUMA information: - set the default table count to 1 for all possible nodes - select node 0 (instead of current NUMA) node to get consistent performance - generate an error indicating that the BIOS should be upgraded Reviewed-by: Gary Leshner <gary.s.leshner@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe2018-08-163-10/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * Merge tag 'pci-v4.19-changes' of ↵Linus Torvalds2018-08-161-3/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: - Decode AER errors with names similar to "lspci" (Tyler Baicar) - Expose AER statistics in sysfs (Rajat Jain) - Clear AER status bits selectively based on the type of recovery (Oza Pawandeep) - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru Gagniuc) - Don't clear AER status bits if we're using the "Firmware-First" strategy where firmware owns the registers (Alexandru Gagniuc) - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy Shevchenko) - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas) - Defer DPC event handling to work queue (Keith Busch) - Use threaded IRQ for DPC bottom half (Keith Busch) - Print AER status while handling DPC events (Keith Busch) - Work around IDT switch ACS Source Validation erratum (James Puthukattukaran) - Emit diagnostics for all cases of PCIe Link downtraining (Links operating slower than they're capable of) (Alexandru Gagniuc) - Skip VFs when configuring Max Payload Size (Myron Stowe) - Reduce Root Port Max Payload Size if necessary when hot-adding a device below it (Myron Stowe) - Simplify SHPC existence/permission checks (Bjorn Helgaas) - Remove hotplug sample skeleton driver (Lukas Wunner) - Convert pciehp to threaded IRQ handling (Lukas Wunner) - Improve pciehp tolerance of missed events and initially unstable links (Lukas Wunner) - Clear spurious pciehp events on resume (Lukas Wunner) - Add pciehp runtime PM support, including for Thunderbolt controllers (Lukas Wunner) - Support interrupts from pciehp bridges in D3hot (Lukas Wunner) - Mark fall-through switch cases before enabling -Wimplicit-fallthrough (Gustavo A. R. Silva) - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig) - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied (Heiner Kallweit) - Unify PCI and DMA direction #defines (Shunyong Yang) - Add PCI_DEVICE_DATA() macro (Andy Shevchenko) - Check for VPD completion before checking for timeout (Bert Kenward) - Limit Netronome NFP5000 config space size to work around erratum (Jakub Kicinski) - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit) - Document ACPI description of PCI host bridges (Bjorn Helgaas) - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for peer-to-peer DMA support (we don't have the peer-to-peer support yet; this is just one piece) (Logan Gunthorpe) - Clean up devm_of_pci_get_host_bridge_resources() resource allocation (Jan Kiszka) - Fixup resizable BARs after suspend/resume (Christian König) - Make "pci=earlydump" generic (Sinan Kaya) - Fix ROM BAR access routines to stay in bounds and check for signature correctly (Rex Zhu) - Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer) - Expand documentation for pci_add_dma_alias() (Logan Gunthorpe) - To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) - Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt) - Remove Aardvark outbound window configuration (Evan Wang) - Fix Aardvark bridge window sizing issue (Zachary Zhang) - Convert Aardvark to use pci_host_probe() to reduce code duplication (Thomas Petazzoni) - Correct the Cadence cdns_pcie_writel() signature (Alan Douglas) - Add Cadence support for optional generic PHYs (Alan Douglas) - Add Cadence power management ops (Alan Douglas) - Remove redundant variable from Cadence driver (Colin Ian King) - Add Kirin MSI support (Xiaowei Song) - Drop unnecessary root_bus_nr setting from exynos, imx6, keystone, armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn Guo) - Move link notification settings from DesignWare core to individual drivers (Gustavo Pimentel) - Add endpoint library MSI-X interfaces (Gustavo Pimentel) - Correct signature of endpoint library IRQ interfaces (Gustavo Pimentel) - Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel) - Add endpoint library MSI-X test support (Gustavo Pimentel) - Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation (Jia-Ju Bai) - Add more devices to Broadcom PAXC quirk (Ray Jui) - Work around corrupted Broadcom PAXC config space to enable SMMU and GICv3 ITS (Ray Jui) - Disable MSI parsing to work around broken Broadcom PAXC logic in some devices (Ray Jui) - Hide unconfigured functions to work around a Broadcom PAXC defect (Ray Jui) - Lower iproc log level to reduce console output during boot (Ray Jui) - Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi) - Fix mobiveil missing include file (Lorenzo Pieralisi) - Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi) - Fix mvebu I/O space remapping issues (Thomas Petazzoni) - Use generic pci_host_bridge in mvebu instead of ARM-specific API (Thomas Petazzoni) - Whitelist VMD devices with fast interrupt handlers to avoid sharing vectors with slow handlers (Keith Busch) * tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits) PCI/AER: Don't clear AER bits if error handling is Firmware-First PCI: Limit config space size for Netronome NFP5000 PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips PCI/VPD: Check for VPD access completion before checking for timeout PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry PCI: Match Root Port's MPS to endpoint's MPSS as necessary PCI: Skip MPS logic for Virtual Functions (VFs) PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Check for PCIe Link downtraining PCI: Add ACS Redirect disable quirk for Intel Sunrise Point PCI: Add device-specific ACS Redirect disable infrastructure PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support PCI: Allow specifying devices using a base bus and path of devfns PCI: Make specifying PCI devices in kernel parameters reusable PCI: Hide ACS quirk declarations inside PCI core PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers() ...
| | * PCI: Rename pci_try_reset_bus() to pci_reset_bus()Sinan Kaya2018-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the old implementation of pci_reset_bus() is gone, replace pci_try_reset_bus() with pci_reset_bus(). Compared to the old implementation, new code will fail immmediately with -EAGAIN if object lock cannot be obtained. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| | * PCI: Unify try slot and bus reset APISinan Kaya2018-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by querying if a system supports hotplug or not. A survey showed that most drivers don't do this and we are leaking hotplug capability to the user. Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset(). Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| | * IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus ResetSinan Kaya2018-07-191-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting ready to hide pci_reset_bridge_secondary_bus() from the drivers. pci_reset_bridge_secondary_bus() should only be used internally by the PCI code itself. Other drivers should rely on higher level pci_try_reset_bus() API. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2018-08-154-2/+18
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: "Highlights: - Gustavo A. R. Silva keeps working on the implicit switch fallthru changes. - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From Luca Coelho. - Re-enable ASPM in r8169, from Kai-Heng Feng. - Add virtual XFRM interfaces, which avoids all of the limitations of existing IPSEC tunnels. From Steffen Klassert. - Convert GRO over to use a hash table, so that when we have many flows active we don't traverse a long list during accumluation. - Many new self tests for routing, TC, tunnels, etc. Too many contributors to mention them all, but I'm really happy to keep seeing this stuff. - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu. - Lots of cleanups and fixes in L2TP code from Guillaume Nault. - Add IPSEC offload support to netdevsim, from Shannon Nelson. - Add support for slotting with non-uniform distribution to netem packet scheduler, from Yousuk Seung. - Add UDP GSO support to mlx5e, from Boris Pismenny. - Support offloading of Team LAG in NFP, from John Hurley. - Allow to configure TX queue selection based upon RX queue, from Amritha Nambiar. - Support ethtool ring size configuration in aquantia, from Anton Mikaev. - Support DSCP and flowlabel per-transport in SCTP, from Xin Long. - Support list based batching and stack traversal of SKBs, this is very exciting work. From Edward Cree. - Busyloop optimizations in vhost_net, from Toshiaki Makita. - Introduce the ETF qdisc, which allows time based transmissions. IGB can offload this in hardware. From Vinicius Costa Gomes. - Add parameter support to devlink, from Moshe Shemesh. - Several multiplication and division optimizations for BPF JIT in nfp driver, from Jiong Wang. - Lots of prepatory work to make more of the packet scheduler layer lockless, when possible, from Vlad Buslov. - Add ACK filter and NAT awareness to sch_cake packet scheduler, from Toke Høiland-Jørgensen. - Support regions and region snapshots in devlink, from Alex Vesker. - Allow to attach XDP programs to both HW and SW at the same time on a given device, with initial support in nfp. From Jakub Kicinski. - Add TLS RX offload and support in mlx5, from Ilya Lesokhin. - Use PHYLIB in r8169 driver, from Heiner Kallweit. - All sorts of changes to support Spectrum 2 in mlxsw driver, from Ido Schimmel. - PTP support in mv88e6xxx DSA driver, from Andrew Lunn. - Make TCP_USER_TIMEOUT socket option more accurate, from Jon Maxwell. - Support for templates in packet scheduler classifier, from Jiri Pirko. - IPV6 support in RDS, from Ka-Cheong Poon. - Native tproxy support in nf_tables, from Máté Eckl. - Maintain IP fragment queue in an rbtree, but optimize properly for in-order frags. From Peter Oskolkov. - Improvde handling of ACKs on hole repairs, from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits) bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT" hv/netvsc: Fix NULL dereference at single queue mode fallback net: filter: mark expected switch fall-through xen-netfront: fix warn message as irq device name has '/' cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0 net: dsa: mv88e6xxx: missing unlock on error path rds: fix building with IPV6=m inet/connection_sock: prefer _THIS_IP_ to current_text_addr net: dsa: mv88e6xxx: bitwise vs logical bug net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() ieee802154: hwsim: using right kind of iteration net: hns3: Add vlan filter setting by ethtool command -K net: hns3: Set tx ring' tc info when netdev is up net: hns3: Remove tx ring BD len register in hns3_enet net: hns3: Fix desc num set to default when setting channel net: hns3: Fix for phy link issue when using marvell phy driver net: hns3: Fix for information of phydev lost problem when down/up net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero net: hns3: Add support for serdes loopback selftest bnxt_en: take coredump_record structure off stack ...
| | * \ Merge branch 'mlx5-next' of ↵Saeed Mahameed2018-07-233-1/+17
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 core infrastructure updates and fixes. From Eran: - Add MPEGC (Management PCIe General Configuration) registers and btis - Fix tristate and description for MLX5 module rom Feras: - Add hardware structures for the firmware tracer From Jainbo: - Core support for double vlan push/pop steering action From Max: - Add XRQ commands definitions From Noa: - Add missing SET_DRIVER_VERSION command translation From Roi: - Use ERR_CAST() instead of coding it From Tariq: - Better return types for CQE API Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * \ \ Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller2018-07-207-16/+22
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net: allow ndo_select_queue to pass netdevAlexander Duyck2018-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes it so that instead of passing a void pointer as the accel_priv we instead pass a net_device pointer as sb_dev. Making this change allows us to pass the subordinate device through to the fallback function eventually so that we can keep the actual code in the ndo_select_queue call as focused on possible on the exception cases. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * | | | | ACPI: Convert ACPI reference args to generic fwnode reference argsSakari Ailus2018-07-231-6/+4
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert all users of struct acpi_reference_args to more generic fwnode_reference_args. This will 1) avoid an ACPI specific references to device nodes with integer arguments as well as 2) allow making references to nodes other than device nodes in ACPI. As a by-product, convert the fwnode interger arguments to u64. The arguments were 64-bit integers on ACPI but the fwnode arguments were just 32-bit. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe2018-08-1610-34/+52
|\ \ \ \ \ | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-07-137-16/+22
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Jason Gunthorpe: "Things have been quite slow, only 6 RC patches have been sent to the list. Regression, user visible bugs, and crashing fixes: - cxgb4 could wrongly fail MR creation due to a typo - various crashes if the wrong QP type is mixed in with APIs that expect other types - syzkaller oops - using ERR_PTR and NULL together cases HFI1 to crash in some cases - mlx5 memory leak in error unwind" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path RDMA/uverbs: Don't fail in creation of multiple flows IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow RDMA/uverbs: Protect from attempts to create flows on unsupported QP iw_cxgb4: correctly enforce the max reg_mr depth
| | * | | RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error pathKamal Heib2018-07-111-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix memory leak in the error path of mlx5_ib_create_srq() by making sure to free the allocated srq. Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | | IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return valuesMichael J. Ruhl2018-06-265-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL. All of the relevant call sites look for IS_ERR, so the NULL return would lead to a NULL pointer exception. Do not use the ERR_PTR mechanism for this function. Update all call sites to handle the return value correctly. Clean up error paths to reflect return value. Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code") Cc: <stable@vger.kernel.org> # 4.9.x+ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | | iw_cxgb4: correctly enforce the max reg_mr depthSteve Wise2018-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code was mistakenly using the length of the page array memory instead of the depth of the page array. This would cause MR creation to fail in some cases. Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | | | Merge tag 'mlx5-fixes-2018-06-26' of ↵David S. Miller2018-06-281-1/+1
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2018-06-26 Fixes for mlx5 core and netdev driver: Two fixes from Alex Vesker to address command interface issues - Race in command interface polling mode - Incorrect raw command length parsing From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster. From Or Gerlitz and Eli Cohin, Address backward compatability issues for when Eswitch capability is not advertised for the PF host driver - Fix required capability for manipulating MPFS - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager - Avoid dealing with vport IB/eth representors if not being e-switch manager - E-Switch, Avoid setup attempt if not being e-switch manager - Don't attempt to dereference the ppriv struct if not being eswitch manager ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | IB/mlx5: Avoid dealing with vport representors if not being e-switch managerOr Gerlitz2018-06-261-1/+1
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In smartnic env, the host (PF) driver might not be an e-switch manager, hence the switchdev mode representors are running on the embedded cpu (EC) and not at the host. As such, we should avoid dealing with vport representors if not being esw manager. Fixes: b5ca15ad7e61 ('IB/mlx5: Add proper representors support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-06-213-17/+29
| |\ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma fixes from Jason Gunthorpe: "Here are eight fairly small fixes collected over the last two weeks. Regression and crashing bug fixes: - mlx4/5: Fixes for issues found from various checkers - A resource tracking and uverbs regression in the core code - qedr: NULL pointer regression found during testing - rxe: Various small bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/rxe: Fix missing completion for mem_reg work requests RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() IB/uverbs: Fix ordering of ucontext check in ib_uverbs_write IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()' RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CM IB/mlx5: Fix return value check in flow_counters_set_data() IB/mlx5: Fix memory leak in mlx5_ib_create_flow IB/rxe: avoid double kfree skb
| | * | IB/mlx4: Fix an error handling path in 'mlx4_ib_rereg_user_mr()'Christophe Jaillet2018-06-111-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before returning -EPERM we should release some resources, as already done in the other error handling path of the function. Fixes: d8f9cc328c88 ("IB/mlx4: Mark user MR as writable if actual virtual memory is writable") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | RDMA/qedr: Fix NULL pointer dereference when running over iWARP without RDMA-CMKalderon, Michal2018-06-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some RoCE specific code in qedr_modify_qp was run over an iWARP device when running perftest benchmarks without the -R option. The commit 3e44e0ee0893 ("IB/providers: Avoid null netdev check for RoCE") exposed this. Dropping the check for NULL pointer on ndev in qedr_modify_qp lead to a null pointer dereference when running over iWARP. Before the code would identify ndev as being NULL and return an error. Fixes: 3e44e0ee0893 ("IB/providers: Avoid null netdev check for RoCE") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | IB/mlx5: Fix return value check in flow_counters_set_data()weiyongjun (A)2018-06-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function mlx5_fc_create() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| | * | IB/mlx5: Fix memory leak in mlx5_ib_create_flowGustavo A. R. Silva2018-06-111-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case memory resources for *ucmd* were allocated, release them before return. Addresses-Coverity-ID: 1469857 ("Resource leak") Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Fix usage of bitmap allocation functions return valuesGal Pressman2018-08-152-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hns bitmap allocation functions return 0 on success and -1 on failure. Callers of these functions wrongly used their return value as an errno, fix that by making a proper conversion. Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc") Signed-off-by: Gal Pressman <pressmangal@gmail.com> Acked-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | qedr: Add user space support for SRQYuval Bason2018-08-142-44/+191
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for SRQ's created in user space and update qedr_affiliated_event to deal with general SRQ events. Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Yuval Bason <yuval.bason@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | qedr: Add support for kernel mode SRQ'sYuval Bason2018-08-145-13/+458
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the SRQ specific verbs and update the poll_cq verb to deal with SRQ completions. Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Yuval Bason <yuval.bason@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | qedr: Add wrapping generic structure for qpidr and adjust idr routines.Yuval Bason2018-08-144-29/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today, we are using idr mechanism for QP's only. This patch prepares the qedr_idr stuctures and the idr routines for both QP's and SRQ's. Signed-off-by: Yuval Bason <yuval.bason@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | IB/mlx5: Fix leaking stack memory to userspaceJason Gunthorpe2018-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes were written. Fixes: 41d902cb7c32 ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp") Cc: <stable@vger.kernel.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | IB/uverbs: Use uverbs_alloc for allocationsJason Gunthorpe2018-08-131-53/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several handlers need temporary allocations for the life of the method, switch them to use the uverbs_alloc allocator. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
* | | | IB/uverbs: Have the core code create the uverbs_root_specJason Gunthorpe2018-08-102-29/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
* | | | iw_cxgb4: pass window scale in flowc work requestPotnuri Bharat Teja2018-08-082-24/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow FW to not send more data to TP (which would then need to be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR. Also refactor send_flowc() a bit to clean it up. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/mlx5: Fix shift overflow in mlx5_ib_create_wqLeon Romanovsky2018-08-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ 61.182439] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/qp.c:5366:34 [ 61.183673] shift exponent 4294967288 is too large for 32-bit type 'unsigned int' [ 61.185530] CPU: 0 PID: 639 Comm: qp Not tainted 4.18.0-rc1-00037-g4aa1d69a9c60-dirty #96 [ 61.186981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 61.188315] Call Trace: [ 61.188661] dump_stack+0xc7/0x13b [ 61.190427] ubsan_epilogue+0x9/0x49 [ 61.190899] __ubsan_handle_shift_out_of_bounds+0x1ea/0x22f [ 61.197040] mlx5_ib_create_wq+0x1c99/0x1d50 [ 61.206632] ib_uverbs_ex_create_wq+0x499/0x820 [ 61.213892] ib_uverbs_write+0x77e/0xae0 [ 61.248018] vfs_write+0x121/0x3b0 [ 61.249831] ksys_write+0xa1/0x120 [ 61.254024] do_syscall_64+0x7c/0x2a0 [ 61.256178] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 61.259211] RIP: 0033:0x7f54bab70e99 [ 61.262125] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 [ 61.268678] RSP: 002b:00007ffe1541c318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 61.271076] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54bab70e99 [ 61.273795] RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003 [ 61.276982] RBP: 00007ffe1541c330 R08: 00000000200078e0 R09: 0000000000000002 [ 61.280035] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005c0 [ 61.283279] R13: 00007ffe1541c420 R14: 0000000000000000 R15: 0000000000000000 Cc: <stable@vger.kernel.org> # 4.7 Fixes: 79b20a6c3014 ("IB/mlx5: Add receive Work Queue verbs") Cc: syzkaller <syzkaller@googlegroups.com> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/netdev: Use priv_destructor for netdev cleanupJason Gunthorpe2018-08-021-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
* | | | iw_cxgb4: Support FW write completion WRPotnuri Bharat Teja2018-08-024-2/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To optimize NVME-oF READ IOPs, use a specialized WQE that combines the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target driver. This reduces uP overhead per NVME-oF IO, and results in over 10% improvement in NVME-oF 4K READ IOPs. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | iw_cxgb4: RDMA write with immediate supportPotnuri Bharat Teja2018-08-024-15/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | rdma/cxgb4: fix some info leaksDan Carpenter2018-08-021-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In c4iw_create_qp() there are several struct members which potentially aren't inintialized like uresp.rq_key. I've fixed this code before in in commit ae1fe07f3f42 ("RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()") so this time I'm just going to take a big hammer approach and memset the whole struct to zero. Hopefully, it will stay fixed this time. In c4iw_create_srq() we don't clear uresp.reserved. Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Support flush cqe for hip08 in kernel spaceYixian Liu2018-08-024-20/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to IB protocol, there are some cases that work requests must return the flush error completion status through the completion queue. Due to hardware limitation, the driver needs to assist the flush process. This patch adds the support of flush cqe for hip08 in the cases that needed, such as poll cqe, post send, post recv and aeqe handle. The patch also considered the compatibility between kernel and user space. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | IB/uverbs: Do not pass struct ib_device to the ioctl methodsJason Gunthorpe2018-08-012-27/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does the same as the patch before, except for ioctl. The rules are the same, but for the ioctl methods the core code handles setting up the uobject. - Retrieve the ib_dev from the uobject->context->device. This is safe under ioctl as the core has already done rdma_alloc_begin_uobject and so CREATE calls are entirely protected by the rwsem. - Retrieve the ib_dev from uobject->object - Call ib_uverbs_get_ucontext() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA: Fix return code check in rdma_set_cq_moderationKamal Heib2018-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is not supported, all drivers should generate this and all users should check for it when detecting not supported functionality. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5) Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | rdma/cxgb4: Simplify a structure initializationBart Van Assche2018-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids that sparse reports the following warning: drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | rdma/cxgb4: Fix SRQ endianness annotationsBart Van Assche2018-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids that sparse complains about casts to restricted __be32. Fixes: a3cdaa69e4ae ("cxgb4: Adds CPL support for Shared Receive Queues") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | rdma/cxgb4: Remove a set-but-not-used variableBart Van Assche2018-07-311-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids that the following warning is reported when building with W=1: drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable] u8 status; ^~~~~~ Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Program the tclass and flow label into the hardwareLijun Ou2018-07-304-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was missed in a few places, and was just using 0. Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Use macro instead of magic numberLijun Ou2018-07-302-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order to improve readability. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Modify qp will return errno when qp type is illegalLijun Ou2018-07-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set for ret was missing in the error path here, resulting in incorrect error code for modify_qp. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Assign the value for vlan field of qp contextLijun Ou2018-07-302-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch mainly fills the correct value into the vlan id field of qp context as well as update the vlan field name according to the latest hardware user manual. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is setLijun Ou2018-07-301-31/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the related fields of the av into the qp context. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/providers: Remove pointless functionsKamal Heib2018-07-3013-239/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rdma core is taking care of return the right error code when the rdma device callbacks aren't supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | RDMA/providers: Fix return value from create_srq callbacksKamal Heib2018-07-302-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The proper return code is "-EOPNOTSUPP" when the create_srq() callback is not supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | | | IB/mlx4: Use 4K pages for kernel QP's WQE bufferJack Morgenstein2018-07-302-176/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, the driver tries to allocate contiguous memory, and if it fails, it falls back to 4K fragmented allocation. Once the memory is fragmented, the first allocation might take a lot of time, and even fail, which can cause connection failures. This patch changes the logic to always allocate with 4K granularity, since it's more robust and more likely to succeed. This patch was tested with Lustre and no performance degradation was observed. Note: This commit eliminates the "shrinking WQE" feature. This feature depended on using vmap to create a virtually contiguous send WQ. vmap use was abandoned due to problems with several processors (see the commit cited below). As a result, shrinking WQE was available only with physically contiguous send WQs. Allocating such send WQs caused the problems described above. Therefore, as a side effect of eliminating the use of large physically contiguous send WQs, the shrinking WQE feature became unavailable. Warning example: worker/20:1: page allocation failure: order:8, mode:0x80d0 CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------ Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<ffffffff81686d81>] dump_stack+0x19/0x1b [<ffffffff81186160>] warn_alloc_failed+0x110/0x180 [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0 [<ffffffff811ce868>] alloc_pages_current+0x98/0x110 [<ffffffff81184fae>] __get_free_pages+0xe/0x50 [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150 [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50 [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core] [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core] [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib] [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0 [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib] [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib] [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib] [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core] [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm] [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd] [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs] [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd] Fixes: 73898db04301 ("net/mlx4: Avoid wrong virtual mappings") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
OpenPOWER on IntegriCloud