summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* md: Move check for bitmap presence to personality code.Andre Noll2009-06-186-23/+31
| | | | | | | | | | | | | | | | | | | | | | | If the superblock of a component device indicates the presence of a bitmap but the corresponding raid personality does not support bitmaps (raid0, linear, multipath, faulty), then something is seriously wrong and we'd better refuse to run such an array. Currently, this check is performed while the superblocks are examined, i.e. before entering personality code. Therefore the generic md layer must know which raid levels support bitmaps and which do not. This patch avoids this layer violation without adding identical code to various personalities. This is accomplished by introducing a new public function to md.c, md_check_no_bitmap(), which replaces the hard-coded checks in the superblock loading functions. A call to md_check_no_bitmap() is added to the ->run method of each personality which does not support bitmaps and assembly is aborted if at least one component device contains a bitmap. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove chunksize rounding from common code.NeilBrown2009-06-182-56/+3
| | | | | | | | | | | | | It is easiest to round sizes to multiples of chunk size in the personality code for those personalities which care. Those personalities now do the rounding, so we can remove that function from common code. Also remove the upper bound on the size of a chunk, and the lower bound on the size of a device (1 chunk), neither of which really buy us anything. Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0/linear: ensure device sizes are rounded to chunk size.NeilBrown2009-06-182-0/+12
| | | | | | | | | This is currently ensured by common code, but it is more reliable to ensure it where it is needed in personality code. All the other personalities that care already round the size to the chunk_size. raid0 and linear are the only hold-outs. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move assignment of ->utime so that it never gets skipped.NeilBrown2009-06-181-1/+1
| | | | | | | | | | | | Currently the assignment to utime gets skipped for 'external' metadata. So move it to the top of the function so that it always gets effected. This is of largely cosmetic interest. Nothing actually depends on ->utime being right for external arrays. "mdadm --monitor" does use it for 0.90 and 1.x arrays, but with mdadm-3.0, this is not important for external metadata. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Push down reconstruction log message to personality code.Andre Noll2009-06-184-9/+12
| | | | | | | | | | | | Currently, the md layer checks in analyze_sbs() if the raid level supports reconstruction (mddev->level >= 1) and if reconstruction is in progress (mddev->recovery_cp != MaxSector). Move that printk into the personality code of those raid levels that care (levels 1, 4, 5, 6, 10). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: merge reconfig and check_reshape methods.NeilBrown2009-06-184-20/+19
| | | | | | | | | | | | The difference between these two methods is artificial. Both check that a pending reshape is valid, and perform any aspect of it that can be done immediately. 'reconfig' handles chunk size and layout. 'check_reshape' handles raid_disks. So make them just one method. Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove unnecessary arguments from ->reconfig method.NeilBrown2009-06-184-39/+41
| | | | | | | | | | Passing the new layout and chunksize as args is not necessary as the mddev has fields for new_check and new_layout. This is preparation for combining the check_reshape and reconfig methods Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid5: check stripe cache is large enough in start_reshapeNeilBrown2009-06-181-17/+27
| | | | | | | | | | In reshape cases that do not change the number of devices, start_reshape is called without first calling check_reshape. Currently, the check that the stripe_cache is large enough is only done in check_reshape. It should be in start_reshape too. Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: chunk_sectors cleanups.NeilBrown2009-06-181-4/+4
| | | | | | | following the conversion to chunk_sectors, there is room for cleaning up a little. Signed-off-by: NeilBrown <neilb@suse.de>
* md: fix some comments.Andre Noll2009-06-181-2/+0
| | | | | | | 1/ Raid5 has learned to take over also raid4 and raid6 arrays. 2/ new_chunk in mdp_superblock_1 is in sectors, not bytes. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid5: Use is_power_of_2() in raid5_reconfig()/raid6_reconfig().Andre Noll2009-06-181-4/+2
| | | | | Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: convert conf->chunk_size and conf->prev_chunk to sectors.Andre Noll2009-06-182-16/+17
| | | | | | | This kills some more shifts. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Convert mddev->new_chunk to sectors.Andre Noll2009-06-184-38/+43
| | | | | | | | | | A straight-forward conversion which gets rid of some multiplications/divisions/shifts. The patch also introduces a couple of new ones, most of which are due to conf->chunk_size still being represented in bytes. This will be cleaned up in subsequent patches. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Make mddev->chunk_size sector-based.Andre Noll2009-06-187-68/+74
| | | | | | | | | | | | | This patch renames the chunk_size field to chunk_sectors with the implied change of semantics. Since is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9) = is_power_of_2(chunk_sectors) these bits don't need an adjustment for the shift. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0 :Enables chunk size other than powers of 2.raz ben yehuda2009-06-161-30/+77
| | | | | | | | | | | | | Maintain two flows, one for pow2 chunk sizes (which uses masks and shift), and a flow for the general case (which uses sector_div). This is for the sake of performance. - introduce map_sector and is_io_in_chunk_boundary to encapsulate those two flows better for raid0_make_request - fix blk_mergeable to support the two flows. Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: prepare for non-power-of-two chunk sizesraz ben yehuda2009-06-161-14/+15
| | | | | | | | | | Remove chunk size check from md as this is now performed in the run function in each personality. Replace chunk size power 2 code calculations by a regular division. Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid5: chunk size check in setup_confraz ben yehuda2009-06-161-1/+2
| | | | | | | have raid5 check chunk size in run/reshape method instead of in md Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid10: chunk size check in runraz ben yehuda2009-06-161-2/+3
| | | | | | | have raid10 check chunk size in run method instead of in md Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: chunk size check in raid0_runraz ben yehuda2009-06-161-2/+13
| | | | | | | | | | | have raid0 check chunk size in run method instead of in md. This is part of a series moving the checks from common code to the personalities where they belong. hardsect is short and chunksize is an int, so it is safe to use %. Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: have raid0 report its formationraz ben yehuda2009-06-161-0/+33
| | | | | | | Report to the user what are the raid zones Signed-off-by: raziebe@gmail.com Signed-off-by: NeilBrown <neilb@suse.de>
* md: have raid0 compile with MD_DEBUG onraz ben yehuda2009-06-161-7/+13
| | | | | | | Because of the removal of the device list from the strips raid0 did not compile with MD_DEBUG flag on Signed-off-by: NeilBrown <neilb@suse.de>
* md: Binary search in linear raidSandeep K Sinha2009-06-161-5/+17
| | | | | | | Replace the linear search with binary search in which_dev. Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Removing num_sector and replacing start_sector with end_sectorSandeep K Sinha2009-06-162-21/+19
| | | | | | | | Remove num_sectors from dev_info and replace start_sector with end_sector. This makes a lot of comparisons much simpler. Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Removal of hash table in linear raidSandeep K Sinha2009-06-162-95/+3
| | | | | | | | | | | | Get rid of sector_div and hash table for linear raid and replace with a linear search in which_dev. The hash table adds a lot of complexity for little if any gain. Ultimately a binary search will be used which will have smaller cache foot print, a similar number of memory access, and no divisions. Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove mddev_to_conf "helper" macroNeilBrown2009-06-1612-103/+79
| | | | | | | | | | Having a macro just to cast a void* isn't really helpful. I would must rather see that we are simply de-referencing ->private, than have to know what the macro does. So open code the macro everywhere and remove the pointless cast. Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: remove setting of segment boundary.NeilBrown2009-06-161-5/+0
| | | | | | | | | | This setting doesn't seem to make sense (half the chunk size??) and shouldn't be needed. The segment boundary exported by raid0 should simply be the minimum of the segment boundary of all component devices. And we already get that right. Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: remove ->dev pointer from strip_zone structureNeilBrown2009-06-162-11/+11
| | | | | | | | | | If we treat conf->devlist more like a 2 dimensional array, we can get the devlist for a particular zone simply by indexing that array, so we don't need to store the pointers to subarrays in strip_zone. This makes strip_zone smaller and so (hopefully) searches faster. Signed-of-by: NeilBrown <neilb@suse.de>
* md: raid0: remove ->sectors from the strip_zone structure.NeilBrown2009-06-162-15/+19
| | | | | | | | | | | | | | | storing ->sectors is redundant as is can be computed from the difference z->zone_end - (z-1)->zone_end The one place where it is used, it is just as efficient to use a zone_end value instead. And removing it makes strip_zone smaller, so they array of these that is searched on every request has a better chance to say in cache. So discard the field and get the value from elsewhere. Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Fix a memory leak when stopping a raid0 array.Andre Noll2009-06-161-3/+2
| | | | | | | | | | | raid0_stop() removes all references to the raid0 configuration but misses to free the ->devlist buffer. This patch closes this leak, removes a pointless initialization and fixes a coding style issue in raid0_stop(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Allocate all buffers for the raid0 configuration in one function.Andre Noll2009-06-161-30/+17
| | | | | | | | | | | | | Currently the raid0 configuration is allocated in raid0_run() while the buffers for the strip_zone and the dev_list arrays are allocated in create_strip_zones(). On errors, all three buffers are freed in raid0_run(). It's easier and more readable to do the allocation and cleanup within a single function. So move that code into create_strip_zones(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Make raid0_run() return a proper error code.Andre Noll2009-06-161-8/+9
| | | | | | | | | | | | | | Currently raid0_run() always returns -ENOMEM on errors. This is incorrect as running the array might fail for other reasons, for example because not all component devices were available. This patch changes create_strip_zones() so that it returns a proper error code (either -ENOMEM or -EINVAL) rather than 1 on errors and makes raid0_run(), its single caller, return that value instead of -ENOMEM. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Remove hash spacing and sector shift.Andre Noll2009-06-162-65/+1
| | | | | | | | | | The "sector_shift" and "spacing" fields of struct raid0_private_data were only used for the hash table lookups. So the removal of the hash table allows get rid of these fields as well which simplifies create_strip_zones() and raid0_run() quite a bit. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Remove hash table.Andre Noll2009-06-162-13/+0
| | | | | | | | | The raid0 hash table has become unused due to the changes in the previous patch. This patch removes the hash table allocation and setup code and kills the hash_table field of struct raid0_private_data. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid0: two cleanups in create_stripe_zones.NeilBrown2009-06-161-12/+10
| | | | | | | | | | | 1/ remove current_start. The same value is available in zone->dev_start and storing it separately doesn't gain anything. 2/ rename curr_zone_start to curr_zone_end as we are now more focused on the 'end' of each zone. We end up storing the same number though - the old name was a little confusing (and what does 'current' mean in this context anyway). Signed-off-by: NeilBrown <neilb@suse.de>
* md: raid0: Replace hash table lookup by looping over all strip_zones.Andre Noll2009-06-162-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | The number of strip_zones of a raid0 array is bounded by the number of drives in the array and is in fact much smaller for typical setups. For example, any raid0 array containing identical disks will have only a single strip_zone. Therefore, the hash tables which are used for quickly finding the strip_zone that holds a particular sector are of questionable value and add quite a bit of unnecessary complexity. This patch replaces the hash table lookup by equivalent code which simply loops over all strip zones to find the zone that holds the given sector. In order to make this loop as fast as possible, the zone->start field of struct strip_zone has been renamed to zone_end, and it now stores the beginning of the next zone in sectors. This allows to save one addition in the loop. Subsequent cleanup patches will remove the hash table structure. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'for-linus' of ↵Linus Torvalds2009-06-1424-136/+181
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx IB/mlx4: Add strong ordering to local inval and fast reg work requests IB/ehca: Remove superfluous bitmasks from QP control block RDMA/cxgb3: Limit fast register size based on T3 limitations RDMA/cxgb3: Report correct port state and MTU mlx4_core: Add module parameter for number of MTTs per segment IB/mthca: Add module parameter for number of MTTs per segment RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes() infiniband: Remove void casts IB/ehca: Increment version number IB/ehca: Remove unnecessary memory operations for userspace queue pairs IB/ehca: Fall back to vmalloc() for big allocations IB/ehca: Replace vmalloc() with kmalloc() for queue allocation
| *-------. Merge branches 'cxgb3', 'ehca', 'misc', 'mlx4', 'mthca' and 'nes' into for-linusRoland Dreier2009-06-1422-133/+150
| |\ \ \ \ \
| | | | | | * RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes()Roel Kluin2009-05-151-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With a postfix increment, i is incremented one past 10K/5K before the loop ends, so the error messages will be displayed too soon if the test succeeds on the last iteration. Fix the comparisons to be > instead of >=. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | * | mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTxRoland Dreier2009-06-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When both MSI-X and legacy INTx fail to generate an interrupt, the driver frees the MSI-X interrupts twice. Fix this by clearing the have_irq flag for the MSI-X interrupts when they are freed the first time. This is the same bug that was reported in ib_mthca by Yinghai Lu <yhlu.kernel@gmail.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | * | IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTxRoland Dreier2009-06-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When both MSI-X and legacy INTx fail to generate an interrupt, the driver frees the MSI-X interrupts twice. Fix this by clearing the have_irq flag for the MSI-X interrupts when they are freed the first time. Reported-by: Yinghai Lu <yhlu.kernel@gmail.com> Tested-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | * | IB/mthca: Add module parameter for number of MTTs per segmentEli Cohen2009-05-275-14/+26
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current MTT allocator uses kmalloc() to allocate a buffer for its buddy allocator, and thus is limited in the amount of MTT segments that it can control. As a result, the size of memory that can be registered is limited too. This patch uses a module parameter to control the number of MTT entries that each segment represents, allowing more memory to be registered with the same number of segments. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | * | IB/mlx4: Add strong ordering to local inval and fast reg work requestsJack Morgenstein2009-06-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ConnectX Programmer's Reference Manual states that the "SO" bit must be set when posting Fast Register and Local Invalidate send work requests. When this bit is set, the work request will be executed only after all previous work requests on the send queue have been executed. (If the bit is not set, Fast Register and Local Invalidate WQEs may begin execution too early, which violates the defined semantics for these operations) This fixes the issue with NFS/RDMA reported in <http://lists.openfabrics.org/pipermail/general/2009-April/059253.html> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | * | mlx4_core: Add module parameter for number of MTTs per segmentEli Cohen2009-05-273-6/+16
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current MTT allocator uses kmalloc() to allocate a buffer for its buddy allocator, and thus is limited in the amount of MTT segments that it can control. As a result, the size of memory that can be registered is limited too. This patch uses a module parameter to control the number of MTT entries that each segment represents, allowing more memory to be registered with the same number of segments. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | * | infiniband: Remove void castsJack Stone2009-05-132-7/+6
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | Remove uneeded casts of void *. Signed-off-by: Jack Stone <jwjstone@fastmail.fm> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | IB/ehca: Remove superfluous bitmasks from QP control blockJoachim Fenkes2009-06-032-41/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the fields in the control block are nicely right-aligned, so no masking is necessary. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | IB/ehca: Increment version numberStefan Roscher2009-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | IB/ehca: Remove unnecessary memory operations for userspace queue pairsStefan Roscher2009-05-135-50/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The queue map for flush completion circumvention is only used for kernel space queue pairs. This patch skips the allocation of the queue maps in case the QP is created for userspace. In addition, this patch does not iomap the galpas for kernel usage if the queue pair is only used in userspace. These changes will improve the performance of creation of userspace queue pairs. Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | IB/ehca: Fall back to vmalloc() for big allocationsStefan Roscher2009-05-131-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of large queue pairs there is the possibillity of allocation failures due to memory fragmentation when using kmalloc(). To ensure the memory is allocated even if kmalloc() can not find chunks which are big enough, we fall back to allocating the memory with vmalloc(). Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | IB/ehca: Replace vmalloc() with kmalloc() for queue allocationAnton Blanchard2009-05-131-3/+3
| | |/ | | | | | | | | | | | | | | | | | | | | | To improve performance of driver resource allocation, replace vmalloc() calls with kmalloc(). Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * | RDMA/cxgb3: Limit fast register size based on T3 limitationsSteve Wise2009-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | T3 firmware only supports one WRs worth of page list for fast register work requests. The driver currently allows 2 WRs worth, which doesn't work for T3, so reduce the limit in the driver. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
OpenPOWER on IntegriCloud