| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Remove comment above parse_raid_params() that claims
"devices_handle_discard_safely" is a table line argument when it is
actually is a module parameter.
Also, backfill dm-raid target version 1.6.0 documentation.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
| |
Leverage the block manager's read_only flag instead of duplicating it;
access with new dm_bm_is_read_only() method.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The overwrite has only ever about optimizing away the need to zero a
block if the entire block was being overwritten. As such it is only
relevant when zeroing is enabled.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
|
|
|
|
| |
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
| |
Introduce a single common method for cleaning up a DM device's
mapped_device. No functional change, just eliminates duplication of
delicate mapped_device cleanup code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
More often than not a request that is requeued _is_ mapped (meaning the
clone request is allocated and clone->q is initialized). Rename
dm_requeue_unmapped_original_request() to avoid potential confusion due
to function name containing "unmapped".
Also, remove dm_requeue_unmapped_request() since callers can easily call
the dm_requeue_original_request() directly.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Do not allocate the io_pool mempool for blk-mq request-based DM
(DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().
Also refine __bind_mempools() to have more precise awareness of which
mempools each type of DM device uses -- avoids mempool churn when
reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove unneeded variable used to store return value.
Generated by: scripts/coccinelle/misc/returnvar.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.
This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence. With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.
I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
1) saving a bio's original bi_end_io
2) wiring up an intermediate bi_end_io
3) restoring the original bi_end_io from intermediate bi_end_io
4) calling bio_endio() to execute the restored original bi_end_io
The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called. For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0. When bio_inc_remaining() occurred during step
3 it brought it to a value of 2. When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).
Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining. For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.
Also, the bio_inc_remaining() interface has been moved local to bio.c.
Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The only possible problem of using mutex_lock() instead of trylock
is about deadlock.
If there aren't any locks held before calling blkdev_reread_part(),
deadlock can't be caused by this conversion.
If there are locks held before calling blkdev_reread_part(),
and if these locks arn't required in open, close handler and I/O
path, deadlock shouldn't be caused too.
Both user space's ioctl(BLKRRPART) and md_setup_drive() from
init/do_mounts_md.c belongs to the 1st case, so the conversion is safe
for the two cases.
For loop, the previous patches in this pathset has fixed the ABBA lock
dependency, so the conversion is OK.
For nbd, tx_lock is held when calling the function:
- both open and release won't hold the lock
- when blkdev_reread_part() is run, I/O thread has been stopped
already, so tx_lock won't be acquired in I/O path at that time.
- so the conversion won't cause deadlock for nbd
For dasd, both dasd_open(), dasd_release() and request function don't
acquire any mutex/semphone, so the conversion should be safe.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch exports blkdev_reread_part() for block drivers, also
introduce __blkdev_reread_part().
For some drivers, such as loop, reread of partitions can be run
from the release path, and bd_mutex may already be held prior to
calling ioctl_by_bdev(bdev, BLKRRPART, 0), so introduce
__blkdev_reread_part for use in such cases.
CC: Christoph Hellwig <hch@lst.de>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Stop abusing struct page functionality and the swap end_io handler, and
instead add a modified version of the blk-lib.c bio_batch helpers.
Also move the block I/O code into swap.c as they are directly tied into
each other.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Ming Lin <mlin@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Various previous patches removed bits and left holes, collapse them
all. Leave the reset start bit where it is, we don't need to change
that.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since the big barrier rewrite/removal in 2007 we never fail FLUSH or
FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag
to help propagating those to the buffer_head layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
lockdep gets unhappy about the not disabling irqs when using the queue_lock
around it. Instead of trying to fix that up just switch to an atomic_t
and get rid of the lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Last patch makes plug work for multiple queue case. However it only
works for single disk case, because it assumes only one request in the
plug list. If a task is accessing multiple disks, eg MD/DM, the
assumption is wrong. Let blk_attempt_plug_merge() record request from
the same queue.
V2: use NULL parameter in !mq case. Fix a bug. Add comments in
blk_attempt_plug_merge to make it less (hopefully) confusion.
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
plug is still helpful for workload with IO merge, but it can be harmful
otherwise especially with multiple hardware queues, as there is
(supposed) no lock contention in this case and plug can introduce
latency. For multiple queues, we do limited plug, eg plug only if there
is request merge. If a request doesn't have merge with following
request, the requet will be dispatched immediately.
V2: check blk_queue_nomerges() as suggested by Jeff.
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we directly issue a request and it fails, we use
blk_mq_merge_queue_io(). But we already assigned bio to a request in
blk_mq_bio_to_request. blk_mq_merge_queue_io shouldn't run
blk_mq_bio_to_request again.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The following appears in blk_sq_make_request:
/*
* If we have multiple hardware queues, just go directly to
* one of those for sync IO.
*/
We clearly don't have multiple hardware queues, here! This comment was
introduced with this commit 07068d5b8e (blk-mq: split make request
handler for multi and single queue):
We want slightly different behavior from them:
- On single queue devices, we currently use the per-process plug
for deferred IO and for merging.
- On multi queue devices, we don't use the per-process plug, but
we want to go straight to hardware for SYNC IO.
The old code had this:
use_plug = !is_flush_fua && ((q->nr_hw_queues == 1) || !is_sync);
and that was converted to:
use_plug = !is_flush_fua && !is_sync;
which is not equivalent. For the single queue case, that second half of
the && expression is always true. So, what I think was actually inteded
follows (and this more closely matches what is done in blk_queue_bio).
V2: delete the 'likely', which should not be a big deal
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
block plug callback could sleep, so we introduce a parameter
'from_schedule' and corresponding drivers can use it to destinguish a
schedule plug flush or a plug finish. Unfortunately io_schedule_out
still uses blk_flush_plug(). This causes below output (Note, I added a
might_sleep() in raid1_unplug to make it trigger faster, but the whole
thing doesn't matter if I add might_sleep). In raid1/10, this can cause
deadlock.
This patch makes io_schedule_out always uses blk_schedule_flush_plug.
This should only impact drivers (as far as I know, raid 1/10) which are
sensitive to the 'from_schedule' parameter.
[ 370.817949] ------------[ cut here ]------------
[ 370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
[ 370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff81092fcf>] prepare_to_wait+0x2f/0x90
[ 370.817971] Modules linked in: raid1
[ 370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G W 4.0.0+ #361
[ 370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[ 370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[ 370.817985] ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
[ 370.817988] ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
[ 370.817990] ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
[ 370.817993] Call Trace:
[ 370.817999] [<ffffffff819dd7af>] dump_stack+0x4f/0x7b
[ 370.818002] [<ffffffff81051afc>] warn_slowpath_common+0x8c/0xd0
[ 370.818004] [<ffffffff81051b86>] warn_slowpath_fmt+0x46/0x50
[ 370.818006] [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
[ 370.818008] [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
[ 370.818010] [<ffffffff810776ef>] __might_sleep+0x7f/0x90
[ 370.818014] [<ffffffffa0000c03>] raid1_unplug+0xd3/0x170 [raid1]
[ 370.818024] [<ffffffff81421d9a>] blk_flush_plug_list+0x8a/0x1e0
[ 370.818028] [<ffffffff819e3550>] ? bit_wait+0x50/0x50
[ 370.818031] [<ffffffff819e21b0>] io_schedule_timeout+0x130/0x140
[ 370.818033] [<ffffffff819e3586>] bit_wait_io+0x36/0x50
[ 370.818034] [<ffffffff819e31b5>] __wait_on_bit+0x65/0x90
[ 370.818041] [<ffffffff8125b67c>] ? ext4_read_block_bitmap_nowait+0xbc/0x630
[ 370.818043] [<ffffffff819e3550>] ? bit_wait+0x50/0x50
[ 370.818045] [<ffffffff819e3302>] out_of_line_wait_on_bit+0x72/0x80
[ 370.818047] [<ffffffff810935e0>] ? autoremove_wake_function+0x40/0x40
[ 370.818050] [<ffffffff811de744>] __wait_on_buffer+0x44/0x50
[ 370.818053] [<ffffffff8125ae80>] ext4_wait_block_bitmap+0xe0/0xf0
[ 370.818058] [<ffffffff812975d6>] ext4_mb_init_cache+0x206/0x790
[ 370.818062] [<ffffffff8114bc6c>] ? lru_cache_add+0x1c/0x50
[ 370.818064] [<ffffffff81297c7e>] ext4_mb_init_group+0x11e/0x200
[ 370.818066] [<ffffffff81298231>] ext4_mb_load_buddy+0x341/0x360
[ 370.818068] [<ffffffff8129a1a3>] ext4_mb_find_by_goal+0x93/0x2f0
[ 370.818070] [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
[ 370.818072] [<ffffffff8129ab67>] ext4_mb_regular_allocator+0x67/0x460
[ 370.818074] [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
[ 370.818076] [<ffffffff8129ca4b>] ext4_mb_new_blocks+0x4cb/0x620
[ 370.818079] [<ffffffff81290956>] ext4_ext_map_blocks+0x4c6/0x14d0
[ 370.818081] [<ffffffff812a4d4e>] ? ext4_es_lookup_extent+0x4e/0x290
[ 370.818085] [<ffffffff8126399d>] ext4_map_blocks+0x14d/0x4f0
[ 370.818088] [<ffffffff81266fbd>] ext4_writepages+0x76d/0xe50
[ 370.818094] [<ffffffff81149691>] do_writepages+0x21/0x50
[ 370.818097] [<ffffffff811d5c00>] __writeback_single_inode+0x60/0x490
[ 370.818099] [<ffffffff811d630a>] writeback_sb_inodes+0x2da/0x590
[ 370.818103] [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
[ 370.818105] [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
[ 370.818107] [<ffffffff811d665f>] __writeback_inodes_wb+0x9f/0xd0
[ 370.818109] [<ffffffff811d69db>] wb_writeback+0x34b/0x3c0
[ 370.818111] [<ffffffff811d70df>] bdi_writeback_workfn+0x23f/0x550
[ 370.818116] [<ffffffff8106bbd8>] process_one_work+0x1c8/0x570
[ 370.818117] [<ffffffff8106bb5b>] ? process_one_work+0x14b/0x570
[ 370.818119] [<ffffffff8106c09b>] worker_thread+0x11b/0x470
[ 370.818121] [<ffffffff8106bf80>] ? process_one_work+0x570/0x570
[ 370.818124] [<ffffffff81071868>] kthread+0xf8/0x110
[ 370.818126] [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
[ 370.818129] [<ffffffff819e9322>] ret_from_fork+0x42/0x70
[ 370.818131] [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
[ 370.818132] ---[ end trace 7b4deb71e68b6605 ]---
V2: don't change ->in_iowait
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Current code looks like inner plug gets flushed with a
blk_finish_plug(). Actually it's a nop. All requests/callbacks are added
to current->plug, while only outmost plug is assigned to current->plug.
So inner plug always has empty request/callback list, which makes
blk_flush_plug_list() a nop. This tries to make the code more clear.
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This removes the request types and hacks from the block code and into the
old IDE driver. There is a small amunt of code duplication due to this,
but it's not too bad.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These values are only used by the IDE driver, so move them into it
by allowing drivers to take cmd_type values after the first private
one. Note that we have to turn cmd_type into a plain unsigned integer
so that gcc doesn't complain about mismatching enum types.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Struct bio has a reference count that controls when it can be freed.
Most uses cases is allocating the bio, which then returns with a
single reference to it, doing IO, and then dropping that single
reference. We can remove this atomic_dec_and_test() in the completion
path, if nobody else is holding a reference to the bio.
If someone does call bio_get() on the bio, then we flag the bio as
now having valid count and that we must properly honor the reference
count when it's being put.
Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Struct bio has an atomic ref count for chained bio's, and we use this
to know when to end IO on the bio. However, most bio's are not chained,
so we don't need to always introduce this atomic operation as part of
ending IO.
Add a helper to elevate the bi_remaining count, and flag the bio as
now actually needing the decrement at end_io time. Rename the field
to __bi_remaining to catch any current users of this doing the
incrementing manually.
For high IOPS workloads, this reduces the overhead of bio_endio()
substantially.
Tested-by: Robert Elliott <elliott@hp.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn"). In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int. This code made
assumptions about the value of BIO_MAX_SECTORS.
A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At
this point the cast from sector_t to int resulted in a zero value. The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.
This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dm-multipath accepts 0 path mapping.
# echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev
Such a mapping can be used to release underlying devices while still
holding requests in its queue until working paths come back.
However, once the multipath device is created over blk-mq devices,
it rejects reloading of 0 path mapping:
# echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
| dmsetup create mpath1
# echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
device-mapper: reload ioctl on mpath1 failed: Invalid argument
Command failed
With following kernel message:
device-mapper: ioctl: can't change device type after initial table load.
DM tries to inherit the current table type using dm_table_set_type()
but it doesn't work as expected because of unnecessary check about
whether the target type is hybrid or not.
Hybrid type is for targets that work as either request-based or bio-based
and not required for blk-mq or non blk-mq checking.
Fixes: 65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.
The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request"). But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.
Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).
Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
DM blk-mq device's .queue_rq. This cleans up the previous convoluted
handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
(even though it wasn't) and then run blk_mq_requeue_request() followed
by blk_mq_kick_requeue_list().
Also, document that DM blk-mq ontop of old request_fn devices cannot
fail in clone_rq() since the clone request is preallocated as part of
the pdu.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise kmemleak reported:
unreferenced object 0xffff88009b14e2b0 (size 16):
comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
hex dump (first 16 bytes):
40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00 @...............
backtrace:
[<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
[<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
[<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
[<ffffffff8111cb37>] mempool_alloc+0x57/0x150
[<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
[<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
[<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
[<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
[<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
[<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
[<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
[<ffffffff812c0f30>] generic_make_request+0xc0/0x110
[<ffffffff812c0ff2>] submit_bio+0x72/0x150
[<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
[<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
[<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When stacking request-based DM on blk_mq device, request cloning and
remapping are done in a single call to target's clone_and_map_rq().
The clone is allocated and valid only if clone_and_map_rq() returns
DM_MAPIO_REMAPPED.
The "IS_ERR(clone)" check in map_request() does not cover all the
!DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
are not ready or unavailable, clone_and_map_rq() may return
DM_MAPIO_REQUEUE without ever having established an ERR_PTR). Fix this
by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
map_request().
Without this fix, DM core may call setup_clone() for a NULL clone
and oops like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
...
CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
...
Call Trace:
[<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
[<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
[<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
[<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
[<ffffffff81071fdd>] kthread+0xe6/0xee
[<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
[<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff814c2d98>] ret_from_fork+0x58/0x90
[<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Without kicking queue, requeued request may stay forever in
the queue if there are no other I/O activities to the device.
The original error had been in v2.6.39 with commit 7eaceaccab5f
("block: remove per-queue plugging"), which replaced conditional
plugging by periodic runqueue.
Commit 9d1deb83d489 in v4.1-rc1 removed the periodic runqueue
and the problem started to manifest.
Fixes: 9d1deb83d489 ("dm: don't schedule delayed run of the queue if nothing to do")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of five fixes: Two MAINTAINER email updates (urgent
because the non-avagotech emails will start bouncing) an lpfc big
endian oops fix, a 256 byte sector hang fix (to eliminate 256 byte
sectors) and a storvsc fix which could cause test unit ready failures
on bringup"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
MAINTAINERS: Revise lpfc maintainers for Avago Technologies ownership of Emulex
MAINTAINERS, be2iscsi: change email domain
sd: Disable support for 256 byte/sector disks
lpfc: Fix breakage on big endian kernels
storvsc: Set the SRB flags correctly when no data transfer is needed
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The old email addresses will go away very soon. Revising with new addresses.
Signed-off-by: Dick Kennedy <dick.kennedy@avagotech.com>
Signed-off-by: James Smart <james.smart@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
be2iscsi change of ownership from Emulex to Avago Technologies recently. We
like to get the following updates in: changed "Emulex" to "Avago
Technologies", changed email addresses from "emulex.com" to "avagotech.com",
updated MAINTAINER list for be2iscsi driver.
Signed-off-by: Minh Tran <minh.tran@avagotech.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.
Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts 4fbdf9cb it breaks LPFC on POWER7 machine, big endian kernel.
Without this, the kernel enters an infinite oops loop.
This is the hardware used for verification:
0005:01:00.0 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter [10df:f100] (rev 03)
0005:01:00.1 Fibre Channel [0c04]: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter [10df:f100] (rev 03)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set the SRB flags correctly when there is no data transfer. Without this
change some IHV drivers will fail valid commands such as TEST_UNIT_READY.
Cc: <stable@vger.kernel.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"One more fix from the timer departement:
- Handle division of negative nanosecond values proper on 32bit.
A recent cleanup wrecked the sign handling of the dividend and
dropped the check for negative divisors"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Fix ktime_divns to do signed division
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.
And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353
This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.
Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.
[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]
Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irqchip fix from Thomas Gleixner:
"A fix for a GIC-V3 irqchip regression which prevents some systems from
booting"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gicv3-its: ITS table size should not be smaller than PSZ
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When allocating a device table, if the requested allocation is smaller
than the default granule size of the ITS then, we need to round up to
the default size.
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
[ stuart: Added comments and massaged changelog ]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Reviewed-by: Marc Zygnier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1432134795-661-1-git-send-email-stuart.yoder@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|