summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
*-------. Merge branches 'cxgb4-2', 'i40iw-2', 'ipoib', 'misc-4.7' and 'mlx5-fcs' into ↵Doug Ledford2016-05-1320-232/+483
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | k.o/for-4.7
| | | | | * IB/mlx5: Report Scatter FCS device capability when supportedMajd Dibbiny2016-05-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Report Scatter FCS support when the Firmware supports as well. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * IB/mlx5: Add Scatter FCS support for Raw Packet QPMajd Dibbiny2016-05-132-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable Scatter FCS in the RQ context when the user passes Scatter FCS create flag. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * IB/core: Add Scatter FCS create flagMajd Dibbiny2016-05-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Raw Packet QPs that were created with Scatter FCS flag, will scatter the FCS into the receive buffers. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * IB/core: Add extended device capability flagsMajd Dibbiny2016-05-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all the uverbs device_cap_flags are occupied, we need a place to expose more device capabilities. This patch adds a new 64 bit device_cap_flags_ex to expose new device capabilities. The lower 32 bits will be identical to the original device_cap_flags, The upper 32 bits will be new capabilities. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * i40iw: pass hw_stats by reference rather than by valueColin Ian King2016-05-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * i40iw: Remove unnecessary synchronize_irq() before free_irq()Lars-Peter Clausen2016-05-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | | * i40iw: constify i40iw_vf_cqp_ops structureJulia Lawall2016-05-133-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/mlx5: Add UARs write-combining and non-cached mappingGuy Levi2016-05-132-21/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By this patch, the user space library will be able to improve performance using appropriate ringing DoorBell method according to the memory type it asked for. Currently only one mapping command is allowed for UARs: MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the UARs to write-combining (WC) if the system supports it. If the system is not supporting WC the UARs are mapped to non-cached(NC). In this case the user space library can't tell which mapping is applied. This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as requested and fails if it can't. Since there is no generic way to check if the requested memory region can be mapped as WC, driver enables conclusive WC mapping only for x86, PowerPC and ARM which support WC for the device's memory region. Signed-off-by: Guy Levy <guyle@mellanox.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/mlx5: Allow mapping the free running counter on PROT_EXECMatan Barak2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mlx5 code disallows mapping the free running counter of mlx5 based hardwares when PROT_EXEC is set. Although this behaviour is correct, Linux does add an implicit VM_EXEC to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process personality. This happens for example if the process stack is executable. This causes libmlx5 to output a warning and prevents the user from reading the free running clock. Executing the init segment of the hardware isn't a security risk (at least no more than executing a process own stack), so we just prevent writes to there. Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/mlx4: Use list_for_each_entry_safeGeliang Tang2016-05-131-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simplify the code in search_relocate_mgid0_group with by using list_for_each_entry_safe(). Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/SA: Use correct free functionMark Bloch2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b92a02 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/core: Fix a potential array overrun in CMA and SA agentMark Bloch2016-05-133-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix array overrun when going over callback table. In declaration of callback table, the max size isn't provided and in registration phase, it is provided. There is potential scenario where a new operation is added and it is not supported by current client. The acceptance of such operation by ib_netlink will cause to array overrun. Fixes: 809d5fc9bf65 ("infiniband: pass rdma_cm module to netlink_dump_start") Fixes: b493d91d333e ("iwcm: common code for port mapper") Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/core: Remove unnecessary check in ibnl_rcv_msgMark Bloch2016-05-131-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1)) which means op (defined as an int) can never be a negative number. Fixes: b2cbae2c2487 ('RDMA: Add netlink infrastructure') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/IWPM: Fix a potential skb leakMark Bloch2016-05-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | RDMA/nes: replace custom print_hex_dump()Andy Shevchenko2016-05-131-57/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to duplicate a lot of code that is in the kernel library for ages. Replace duplicating code by calling to print_hex_dump() directly. Note that output is slightly changed: - hex and ascii parts have just two spaces delimeter - there is no delimeter for ascii portions - file and line removed from prefix (they were redundant anyway since previous output shows same closer enough) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/mlx4: trivial fix of spelling mistake on "argument"Colin Ian King2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix spelling mistake, argumant -> argument Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/nes: Deinline nes_free_qp_mem, save 1072 bytesDenys Vlasenko2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function compiles to 550 bytes of machine code. Three callsites, all in nes_create_qp. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Faisal Latif <faisal.latif@intel.com> CC: Doug Ledford <dledford@redhat.com> CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | RDMA/nes: Adding queue drain functionsTatyana Nikolova2016-05-132-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding sq and rq drain functions, which block until all previously posted wr-s in the specified queue have completed. A completion object is signaled to unblock the thread, when the last cqe for the corresponding queue is processed. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | iwcm: Fix a sparse warningBart Van Assche2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that sparse complains about the comparison of s_addr with INADDR_ANY. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | | * | IB/srp: Fix a debug kernel crashBart Van Assche2016-05-131-1/+1
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that the following BUG() is triggered against a debug kernel: kernel BUG at include/linux/scatterlist.h:92! RIP: 0010:[<ffffffffa0467199>] [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp] Call Trace: [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp] [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp] [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40 [<ffffffff81299070>] blk_delay_work+0x20/0x30 [<ffffffff81071f07>] process_one_work+0x197/0x480 [<ffffffff81072239>] worker_thread+0x49/0x490 [<ffffffff810787ea>] kthread+0xea/0x100 [<ffffffff8159b632>] ret_from_fork+0x22/0x40 Fixes: f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | * | IB/ipoib: Add readout of statistics using ethtoolHans Westgaard Ry2016-05-131-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | | * | infiniband/ulp/ipoib: remove pkey_mutexSebastian Andrzej Siewior2016-05-131-2/+0
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last user of pkey_mutex was removed in db84f8803759 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | * | i40iw: pass hw_stats by reference rather than by valueColin Ian King2016-05-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | * | i40iw: Remove unnecessary synchronize_irq() before free_irq()Lars-Peter Clausen2016-05-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| | * | i40iw: constify i40iw_vf_cqp_ops structureJulia Lawall2016-05-133-3/+3
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | iw_cxgb4: Convert a __force castBart Van Assche2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __force casts should be avoided if there is a better alternative. Hence modify the comparison of s_addr with INADDR_ANY such that the __force cast is no longer necessary. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Vipul Pandya <vipul@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Add arp failure handlers to send_mpa_reply/reject()Hariprasad S2016-05-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | These handlers when called print error message to the kernel log, but the actual handling is done by _c4iw_free_ep() and process_timeout(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Always wake up waiter in c4iw_peer_abort_intr()Hariprasad S2016-05-131-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2") However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race: ->local side sets up an active offload connection. ->local side sends MPA_START request. ->peer sends MPA_START response. ->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE: ->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1. ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR. ->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack... The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect(). Fixes: 7c0a33d61187a ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Handle ret value of process_mpa_reply() in rx_dataHariprasad S2016-05-131-2/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: atomic find and reference for listening endpointsHariprasad S2016-05-131-8/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add get_ep_from_stid() which will atomically find and reference the endpoint struct if found. This avoids touch-after-free races between threads destroying listening endpoints and the CPL processing thread processing an incoming PASS_ACCEPT_REQ CPL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Handle ULP accept/reject during ABORTINGHariprasad S2016-05-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c4iw_reject() and c4iw_accept() need to handle the case where the endpoint has timed out and is in the middle of ABORTING the connection. Here is the flow that causes the BUG_ON() to fire on the server side: 1) offload connection setup and endpoint timer started 2) MPA_START request received from peer, CONNECT_REQUEST passed to ULP 3) endpoint timer fires, and process_timeout() aborts the connection, this moves the endpoint state to ABORTING until HW sends up the ABORT_RPL_RSS. 4) application exits closing the CONNECT_REQUEST cm_id. The IWCM calls c4iw_reject_cr() to destroy this connection request. 5) WHAMO: BUG_ON() because the state is ABORTING. The fix is to change c4iw_reject_cr() and c4iw_accept_cr() to fail the operation if the state is not in MPA_REQ_RCVD vs in DEAD. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Release ep for for FPDU_MODE and MPA_REQ_RCVD in process_timeoutHariprasad S2016-05-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARP failure may also happen when ep in FPDU_MODE and these failures need to be handled by process_timeout(). process_timeout() also has to handle case MPA_REQ_RCVD, setting abort to 1, leading to ep resource release. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Free skb in case of arp failure in _c4iw_free_ep()Hariprasad S2016-05-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Arp failure for send_mpa_reply/reject() is handled by freeing the mpa_skb in c4iw_free_ep() before releasing ep. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: atomically lookup ep and get a referenceHariprasad S2016-05-131-28/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between ULP threads calling c4iw_ep_disconnect() via c4iw_modify_rc_qp() and the ingress CPL thread where the ULP thread can free the endpoint just after the ingress CPL thread finds the ep pointer in the tid table. To avoid this, we now use the hwtid_idr table for lookups instead of the LLD tid table so we can lock around insert, remove, and lookup+get_ep to avoid the race. The CPL handlers now will either find the ep ptr and have a ref on it, or not find it and they can discard the CPL. Callers of get_ep_from_tid() will have a ref on the ep if found, and thus must deref when they are done. Negative advice in peer_abort_intr() need to dereference the ep. therefore peer_abort() is scheduled to dereference the ep later. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Handle return value of c4iw_ofld_send() in abort_arp_failure()Hariprasad S2016-05-131-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In abort_arp_failure(), the return value from c4iw_ofld_send() is ignored and thus if the CPL isn't sent, the endpoint is stuck and never gets aborted. Failure of c4iw_ofld_send() is treated as fatal error, and the ep resources are released in a safer context through process_work(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: in process_timeout() don't move ep state to ABORTINGHariprasad S2016-05-131-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moving the state to ABORTING causes the ep to get stuck because c4iw_ep_timeout() thinks the ABORT has already been done. So leave the state alone and let c4iw_ep_disconnect() do the right thing given the ep state. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: handle return value of c4iw_l2t_send() and send_mpa_req()Hariprasad S2016-05-131-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->In act_open_rpl(), CPL_ERR_TCAM_FULL error handling branch, there is no handling of the return value of send_fw_act_open_req(). ->In send_fw_act_open_req(), there is no handling of return value of c4iw_l2t_send(), which may cause a ep leak and won't notify upper layers on connection establish failure. ->send_mpa_req() should act on the return from c4iw_l2t_send() and return the error to the caller. ->In case of c4iw_l2t_send() failure in send_mpa_req(), returns without starting the timer and not changing the ep state, which is further handled by act_establish() -> In act_establish()?if send_mpa_request's get_skb returns an error, may cause an ep leak. So handle return value of send_mpa_req() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiationHariprasad S2016-05-132-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->Stop the ep timer after MPA negotiation so that the arp failures during send_mpa_reply/reject will be handled by process_timeout() after the ep timer expires. ->Added case MPA_REP_SENT in process_timeout(). ->For MPA reject, c4iw_ep_disconnect tries to start an already started timer, which leads to warning message "timer already started". -> In case of mpa reject stop the timer and call send_mpa_reject(). -> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer only for send_mpa_reply(), which is set in c4iw_accept_cr(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Do not stop timer in case of incomplete messagesHariprasad S2016-05-131-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | In case of incomplete mpa messages we should not stop timer as it results in return with timeout for the next mpa message Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: parent_ep has to be dereferenced in case of passive accept ↵Hariprasad S2016-05-131-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | failure -> On passive side of connection parent_ep referenced during connection request has to be dereferenced during the passive accept failure. -> As passive accept failure error handlinglogic runs in atomic context, the parent ep is dereferenced by scheduling work request. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: set the correct FID value in DSGL commandsHariprasad S2016-05-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FID value in a ULP_MEMIO command needs to be set to an IQ ID of a queue configured for our PF. The FID/IQ id is used to index into the PCIE FID table, to find out on which function the DMA needs to be issued. Essentially, every DMA needs to have the ingress queue. The exact ingress queue doesn't matter, but it needs to be an ingress queue associated with the function you want to see the DMA on. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Correct RFC number of MPAHariprasad S2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | RDMA/iw_cxgb4: Add few history bits for epHariprasad S2016-05-132-18/+36
| |/ | | | | | | | | | | | | | | | | | | | | - add EP_DISC_FAIL history bit - add QP_REFED/DEREFED history bits - Add functions to ref/deref the cm_id and add history bit for the same - add CLOSE_CON_RPL history Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | i40iw: pass hw_stats by reference rather than by valueColin Ian King2016-05-131-3/+3
| | | | | | | | | | | | | | | | | | passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | i40iw: Remove unnecessary synchronize_irq() before free_irq()Lars-Peter Clausen2016-05-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | i40iw: constify i40iw_vf_cqp_ops structureJulia Lawall2016-05-133-3/+3
| | | | | | | | | | | | | | | | | | | | The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | i40e: constify i40e_client_ops structureJulia Lawall2016-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | The i40e_client_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/srp: Do not register memory if never_register has been setBart Van Assche2016-05-131-3/+9
| | | | | | | | | | | | | | | | | | | | | | This makes it easier to test the code path that does not use memory registration (srp_map_sg_dma()). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/srp: Prevent mapping failuresBart Van Assche2016-05-132-19/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If both max_sectors and the queue_depth are high enough it can happen that the MR pool is depleted temporarily. This causes the SRP initiator to report mapping failures. Although the SRP initiator recovers from such mapping failures, prevent that this can happen by allocating more memory regions. Additionally, only enable memory registration if at least two pages can be registered per memory region. Reported-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
OpenPOWER on IntegriCloud