summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mthca
Commit message (Collapse)AuthorAgeFilesLines
* IB/mthca: Handle -ENOMEM in forward_trap()Dan Carpenter2011-01-101-0/+2
| | | | | | | ib_create_send_mad() can return ERR_PTR(-ENOMEM) here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/core: Add VLAN support for IBoEEli Cohen2010-10-251-1/+1
| | | | | | | | | | | | | | | | | Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the GID derived from a link local address in the following way: GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN. The 3 bits user priority field of the packets are identical to the 3 bits of the SL. In case of rdma_cm apps, the TOS field is used to generate the SL field by doing a shift right of 5 bits effectively taking to 3 MS bits of the TOS field. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/pack: IBoE UD packet packing supportEli Cohen2010-10-141-1/+1
| | | | | | | | | | Add support for packing IBoE packet headers. Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Clean up and fix ib_ud_header_init() a bit. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB: Rename RAW_ETY to RAW_ETHERTYPEAleksey Senin2010-08-041-1/+1
| | | | | | | | | | Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make the special QP type easier to understand. cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.html Signed-off-by: Aleksey Senin <alekseys@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/core: Allow device-specific per-port sysfs filesRalph Campbell2010-05-211-1/+1
| | | | | | | | | | | | | | | | | Add a new parameter to ib_register_device() so that low-level device drivers can pass in a pointer to a callback function that will be called for each port that is registered in sysfs. This allows low-level device drivers to create files in /sys/class/infiniband/<hca>/ports/<N>/ without having to poke through the internals of the RDMA sysfs handling. There is no need for an unregister function since the kobject reference will go to zero when ib_unregister_device() is called. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Use the dma state API instead of pci equivalentsFUJITA Tomonori2010-04-213-8/+8
| | | | | | | The DMA API is preferred; no functional change. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-307-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* IB/core: Fix and clean up ib_ud_header_init()Eli Cohen2010-02-241-1/+1
| | | | | | | | | | | | ib_ud_header_init() first clears header and then fills up the various fields. Later on, it tests header->immediate_present, which it has already cleared, so the condition is always false. Fix this by adding an immediate_present parameter and setting header->immediate_present as is done with grh_present. Also remove unused calculation of header_len. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Fix access to freed memory in catastrophic event handlingJack Morgenstein2009-09-241-3/+8
| | | | | | | | | | | catas_reset() uses a pointer to mthca_dev, but mthca_dev is not valid after the call to __mthca_restart_one(). Based on a similar patch for mlx4 (634354d7, "mlx4: Fix access to freed memory") by Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*-. Merge branches 'cxgb3', 'ehca', 'ipath', 'ipoib', 'misc', 'mlx4', 'mthca' ↵Roland Dreier2009-09-109-17/+29
|\ \ | | | | | | | | | and 'nes' into for-linus
| | * IB/mthca: Don't allow userspace open while recovering from catastrophic errorJack Morgenstein2009-09-054-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * IB/mthca: Distinguish multiple devices in /proc/interruptsArputham Benjamin2009-09-052-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the mthca driver uses the same name for interrupts for every device in the system. This can make it very confusing trying to work out exactly which device MSI-X interrupts are for. Change the driver to add the PCI name of the device to the interrupt name. Signed-off-by: Arputham Benjamin <abenjamin@sgi.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * IB/mthca: Annotate CQ lockingRoland Dreier2009-09-051-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that lock/unlock both CQs attached to a QP in the proper order to avoid AB-BA deadlocks. Annotate this so sparse can understand what's going on (and warn us if we misuse these functions). Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * IB/mthca: Remove unnecessary include of <linux/init.h>Roland Dreier2009-09-051-1/+0
| | | | | | | | | | | | | | | | | | | | | mthca_reset.c doesn't have any function annotations, so there's no reason to include <linux/init.h>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * IB/mthca: Remove unnecessary include of <asm/page.h>Roland Dreier2009-09-051-2/+0
| |/ |/| | | | | | | | | | | mthca_config_reg.h was including <asm/page.h> for no reason -- the whole file is just defines of constants, so it's entirely self-contained. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB: Use printk_once() for driver versionsMarcin Slusarz2009-09-051-5/+1
|/ | | | | | | Replace open-coded reimplementations with printk_once(). Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Replace dma_sync_single() use with proper functionsRoland Dreier2009-06-221-3/+10
| | | | | | | | | | dma_sync_single() is deprecated now, and the use in mthca is wrong: there should be a dma_sync_single_for_cpu() before touching the memory from the CPU, and a dma_sync_single_for_device() afterwards. Fix this, prompted by a kick in the pants from a patch from FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTxRoland Dreier2009-06-131-1/+3
| | | | | | | | | | | When both MSI-X and legacy INTx fail to generate an interrupt, the driver frees the MSI-X interrupts twice. Fix this by clearing the have_irq flag for the MSI-X interrupts when they are freed the first time. Reported-by: Yinghai Lu <yhlu.kernel@gmail.com> Tested-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Add module parameter for number of MTTs per segmentEli Cohen2009-05-275-14/+26
| | | | | | | | | | | | | The current MTT allocator uses kmalloc() to allocate a buffer for its buddy allocator, and thus is limited in the amount of MTT segments that it can control. As a result, the size of memory that can be registered is limited too. This patch uses a module parameter to control the number of MTT entries that each segment represents, allowing more memory to be registered with the same number of segments. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Fix timeout for INIT_HCA and a few other commandsJack Morgenstein2009-04-201-7/+9
| | | | | | | | | | | | | | | | | Commands INIT_HCA, CLOSE_HCA, SYS_EN, SYS_DIS, and CLOSE_IB all have 1 second timeouts. For INIT_HCA this causes problems when had more than 2^18 are QPs configured, since the command takes more than 1 second to complete. All other commands have 60-second timeouts. This patch makes the above commands consistent with the rest of the commands (and with the chip documentation). This patch is an expansion of a patch from Arthur Kepner <akepner@sgi.com> fixing just the INIT_HCA timeout. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* dma-mapping: replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)Yang Hongyang2009-04-071-2/+2
| | | | | | | | Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32) Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dma-mapping: replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)Yang Hongyang2009-04-071-2/+2
| | | | | | | | Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64) Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* IB/mthca: Fix dispatch of IB_EVENT_LID_CHANGE eventMoni Shoua2009-01-281-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | When snooping a PortInfo MAD, its client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This way of decision ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched) or the MAD is neither of these (and an unnecessary LID_CHANGE event will be dispatched). This causes problems at least with IPoIB, which will do a "light" flush on reregister, rather than the "heavy" flush required due to a LID change. Fix this by dispatching a CLIENT_REREGISTER event if the client_reregister bit is set, but also compare the LID in the MAD to the current LID. If and only if they are not identical then a LID_CHANGE event is dispatched. Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* net: replace %p6 with %pI6Harvey Harrison2008-10-291-2/+2
| | | | | Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* infiniband: use %p6 for printing message idsHarvey Harrison2008-10-281-21/+2
| | | | | Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IB/mthca: Use pci_request_regions()Roland Dreier2008-09-293-111/+14
| | | | | | | | | | | | | Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* dma-mapping: add the device argument to dma_mapping_error()FUJITA Tomonori2008-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* IB/mthca: Keep free count for MTT buddy allocatorRoland Dreier2008-07-222-8/+19
| | | | | | | | | | | | | | | | MTT entries are allocated with a buddy allocator, which just keeps bitmaps for each level of the buddy table. However, all free space starts out at the highest order, and small allocations start scanning from the lowest order. When the lowest order tables have no free space, this can lead to scanning potentially millions of bits before finding a free entry at a higher order. We can avoid this by just keeping a count of how many free entries each order has, and skipping the bitmap scan when an order is completely empty. This provides a nice performance boost for a negligible increase in memory usage. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Fix check of max_send_sge for special QPsRoland Dreier2008-07-141-2/+2
| | | | | | | | | | The MLX transport requires two extra gather entries for sends (one for the header and one for the checksum at the end, as the comment says). However the code checked that max_recv_sge was not too big, instead of checking max_send_sge as it should have. Fix the code to check the correct condition. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Use round_jiffies() for catastrophic error polling timerRoland Dreier2008-07-141-1/+1
| | | | | | | Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Remove "stop" flag for catastrophic error polling timerRoland Dreier2008-07-142-14/+2
| | | | | | | Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Remove extra code for RESET->ERR QP state transitionRoland Dreier2008-07-141-26/+0
| | | | | | | | | | Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/core: Add memory management extensions supportSteve Wise2008-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA: Remove subversion $Id tagsRoland Dreier2008-07-1427-53/+0
| | | | | | They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Clear ICM pages before handing to FWEli Cohen2008-06-231-1/+5
| | | | | | | | | | | | | | | | | | Current memfree FW has a bug which in some cases, assumes that ICM pages passed to it are cleared. This patch uses __GFP_ZERO to allocate all ICM pages passed to the FW. Once firmware with a fix is released, we can make the workaround conditional on firmware version. This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here: http://lists.openfabrics.org/pipermail/general/2008-May/050026.html Cc: <stable@kernel.org> Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing ICM memory and memset()ing it to 0. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Fix max_sge value returned by query_deviceRoland Dreier2008-05-161-1/+13
| | | | | | | | | | | | | | | | | | The mthca driver returns the maximum number of scatter/gather entries returned by the firmware as the max_sge value when device properties are queried. However, the firmware also reports a limit on the maximum descriptor size allowed, and because mthca takes into account the worst case send request overhead when checking whether to allow a QP to be created, the largest number of scatter/gather entries that can be used with mthca may be limited by the maximum descriptor size rather than just by the actual s/g entry limit. This means that applications cannot actually create QPs with max_send_sge equal to the limit returned by ib_query_device(). Fix this by checking if the maximum descriptor size imposes a lower limit and if so returning that lower limit. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attributeRoland Dreier2008-04-293-5/+20
| | | | | | | | | | | | | | | | | | Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the mthca userspace ABI to provide a way for userspace to indicate which memory regions need the DMA write barrier attribute. However, it is possible to handle this without breaking existing userspace, by having the mthca kernel driver recognize whether it is talking to old or new userspace, depending on the size of the register MR structure passed in. The only potential drawback of this is that is allows old userspace (which has a bug with DMA ordering on large SGI Altix systems) to continue to run on new kernels, but the advantage of allowing old userspace to continue to work on unaffected systems seems to outweigh this, and we can print a warning to push people to upgrade their userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Avoid recycling old FMR R_Keys too soonOlaf Kirch2008-04-291-13/+0
| | | | | | | | | | | | | | | | | | | | | | | When a FMR is unmapped, mthca resets the map count to 0, and clears the upper part of the R_Key which is used as the sequence counter. This poses a problem for RDS, which uses ib_fmr_unmap as a fence operation. RDS assumes that after issuing an unmap, the old R_Keys will be invalid for a "reasonable" period of time. For instance, Oracle processes uses shared memory buffers allocated from a pool of buffers. When a process dies, we want to reclaim these buffers -- but we must make sure there are no pending RDMA operations to/from those buffers. The only way to achieve that is by using unmap and sync the TPT. However, when the sequence count is reset on unmap, there is a high likelihood that a new mapping will be given the same R_Key that was issued a few milliseconds ago. To prevent this, don't reset the sequence count when unmapping a FMR. Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB: expand ib_umem_get() prototypeArthur Kepner2008-04-292-2/+16
| | | | | | | | | | | | | | | | | | | Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds2008-04-211-20/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits) SCSI: convert struct class_device to struct device DRM: remove unused dev_class IB: rename "dev" to "srp_dev" in srp_host structure IB: convert struct class_device to struct device memstick: convert struct class_device to struct device driver core: replace remaining __FUNCTION__ occurrences sysfs: refill attribute buffer when reading from offset 0 PM: Remove destroy_suspended_device() Firmware: add iSCSI iBFT Support PM: Remove legacy PM (fix) Kobject: Replace list_for_each() with list_for_each_entry(). SYSFS: Explicitly include required header file slab.h. Driver core: make device_is_registered() work for class devices PM: Convert wakeup flag accessors to inline functions PM: Make wakeup flags available whenever CONFIG_PM is set PM: Fix misuse of wakeup flag accessors in serial core Driver core: Call device_pm_add() after bus_add_device() in device_add() PM: Handle device registrations during suspend/resume block: send disk "change" event for rescan_partitions() sysdev: detect multiple driver registrations ... Fixed trivial conflict in include/linux/memory.h due to semaphore header file change (made irrelevant by the change to mutex).
| * IB: convert struct class_device to struct deviceTony Jones2008-04-191-20/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts the main ib_device to use struct device instead of struct class_device as class_device is going away. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | Convert asm/semaphore.h users to linux/semaphore.hMatthew Wilcox2008-04-181-2/+1
|/ | | | Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* IB/mthca: Update module version and release dateJack Morgenstein2008-04-161-2/+2
| | | | | | | | The ib_mthca driver has been stable for a while, so bump the version number to 1.0 to indicate this. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Update QP state if query QP succeedsDotan Barak2008-04-161-6/+14
| | | | | | | | | If the QP was moved to another state (such as SQE) by the hardware, then after this change the user won't have to set the IBV_QP_CUR_STATE mask in order to execute modify QP in order to recover from this state. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/core: Add support for "send with invalidate" work requestsRoland Dreier2008-04-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/core: Add creation flags to struct ib_qp_init_attrEli Cohen2008-04-161-0/+3
| | | | | | | | | | | | | | Add a create_flags member to struct ib_qp_init_attr that will allow a kernel verbs consumer to create a pass special flags when creating a QP. Add a flag value for telling low-level drivers that a QP will be used for IPoIB UD LSO. The create_flags member will also be useful for XRC and ehca low-latency QP support. Since no create_flags handling is implemented yet, add code to all low-level drivers to return -EINVAL if create_flags is non-zero. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Avoid integer overflow when allocating huge ICM tableRoland Dreier2008-04-161-1/+3
| | | | | | | | | | | | | | In mthca_alloc_icm_table(), the number of entries to allocate for the table->icm array is computed by calculating obj_size * nobj and then dividing by MTHCA_TABLE_CHUNK_SIZE. If nobj is really large, then obj_size * nobj may overflow and the division may get the wrong value (even a negative value). Fix this by calculating the number of objects per chunk and then dividing nobj by this value instead. This patch allows crazy configurations such as loading ib_mthca with the module parameter num_mtt=33554432 to work properly. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Avoid integer overflow when dealing with profile sizeRoland Dreier2008-04-163-7/+10
| | | | | | | | | | | | | | mthca_make_profile() returns the size in bytes of the HCA context layout it creates, or a negative value if an error occurs. However, the return value is declared as u64 and the memfree initialization path casts this value to int to test if it is negative. This makes it think incorrectly than an error has occurred if the context size happens to be bigger than 2GB, since this turns into a negative int. Fix this by having mthca_make_profile() return an s64 and testing for an error by checking whether this 64-bit value itself is negative. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Add IPoIB checksum offload supportEli Cohen2008-04-166-13/+30
| | | | | | | | | | | Arbel and Sinai devices support checksum generation and verification of TCP and UDP packets for UD IPoIB messages. This patch checks if the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it does. It implements support for handling the IB_SEND_IP_CSUM send flag and setting the csum_ok field in receive work completions. Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/mthca: Formatting cleanupsRoland Dreier2008-04-166-11/+11
| | | | | | Fix a few whitespace and other coding style problems. Signed-off-by: Roland Dreier <rolandd@cisco.com>
OpenPOWER on IntegriCloud