summaryrefslogtreecommitdiffstats
path: root/drivers/md
Commit message (Collapse)AuthorAgeFilesLines
* md: allow a resync that is waiting for other resync to complete, to be aborted.NeilBrown2009-12-301-2/+3
| | | | | | | | | | | | | | If two arrays share a device, then they will not both resync at the same time. One will wait for the other to complete. While waiting, the MD_RECOVERY_INTR flag is not checked so a device failure, which would make the resync pointless, does not cause the resync to abort, so the failed device cannot be removed (as it cannot be remove while a resync is happening). So add a test for MD_RECOVERY_INTR. Reported-by: Brett Russ <bruss@netezza.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove unnecessary code from do_md_runNeilBrown2009-12-301-28/+0
| | | | | | | | | | | | | Since commit dfc7064500061677720fa26352963c772d3ebe6b, ->hot_remove_disks has not removed non-failed devices from an array until recovery is no longer possible. So the code in do_md_run to get around the fact that md_check_recovery (which calls ->hot_remove_disks) would remove partially-in-sync devices is no longer needed. So remove it. Signed-off-by: NeilBrown <neilb@suse.de>
* md: make recovery started by do_md_run() visible via sync_actionDan Williams2009-12-301-0/+1
| | | | | | | | | | By default md_do_sync() will perform recovery if no other actions are specified. However, action_show() relies on MD_RECOVERY_RECOVER to be set otherwise it returns 'idle'. So, add a missing set MD_RECOVERY_RECOVER when starting recovery. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: fix small irregularity with start_ro module parameterNeilBrown2009-12-301-1/+1
| | | | | | | | | | | | | | | | | The start_ro modules parameter can be used to force arrays to be started in 'auto-readonly' in which they are read-only until the first write. This ensures that no resync/recovery happens until something else writes to the device. This is important for resume-from-disk off an md array. However if an array is started 'readonly' (by writing 'readonly' to the 'array_state' sysfs attribute) we want it to be really 'readonly', not 'auto-readonly'. So strengthen the condition to only set auto-readonly if the array is not already read-only. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Fix unfortunate interaction with evmsNeilBrown2009-12-301-1/+7
| | | | | | | | | | | | | | | | | | | | | evms configures md arrays by: open device send ioctl close device for each different ioctl needed. Since 2.6.29, the device can disappear after the 'close' unless a significant configuration has happened to the device. The change made by "SET_ARRAY_INFO" can too minor to stop the device from disappearing, but important enough that losing the change is bad. So: make sure SET_ARRAY_INFO sets mddev->ctime, and keep the device active as long as ctime is non-zero (it gets zeroed with lots of other things when the array is stopped). This is suitable for -stable kernels since 2.6.29. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds2009-12-1518-874/+2274
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (80 commits) dm snapshot: use merge origin if snapshot invalid dm snapshot: report merge failure in status dm snapshot: merge consecutive chunks together dm snapshot: trigger exceptions in remaining snapshots during merge dm snapshot: delay merging a chunk until writes to it complete dm snapshot: queue writes to chunks being merged dm snapshot: add merging dm snapshot: permit only one merge at once dm snapshot: support barriers in snapshot merge target dm snapshot: avoid allocating exceptions in merge dm snapshot: rework writing to origin dm snapshot: add merge target dm exception store: add merge specific methods dm snapshot: create function for chunk_is_tracked wait dm snapshot: make bio optional in __origin_write dm mpath: reject messages when device is suspended dm: export suspended state to targets dm: rename dm_suspended to dm_suspended_md dm: swap target postsuspend call and setting suspended flag dm crypt: add plain64 iv ...
| * dm snapshot: use merge origin if snapshot invalidMikulas Patocka2009-12-101-5/+4
| | | | | | | | | | | | | | | | | | If the snapshot we are merging became invalid (e.g. it ran out of space) redirect all I/O directly to the origin device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: report merge failure in statusMike Snitzer2009-12-101-2/+28
| | | | | | | | | | | | | | | | Set 'merge_failed' flag if a snapshot fails to merge. Update snapshot_status() to report "Merge failed" if 'merge_failed' is set. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: merge consecutive chunks togetherMike Snitzer2009-12-101-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s->store->type->prepare_merge returns the number of chunks that can be copied linearly working backwards from the returned chunk number. For example, if it returns 3 chunks with old_chunk == 10 and new_chunk == 20, then chunk 20 can be copied to 10, chunk 19 to 9 and 18 to 8. Until now kcopyd only copied one chunk at a time. This patch now copies the full set at once. Consequently, snapshot_merge_process() needs to delay the merging of all chunks if any have writes in progress, not just the first chunk in the region that is to be merged. snapshot-merge's performance is now comparable to the original snapshot-origin target. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: trigger exceptions in remaining snapshots during mergeMikulas Patocka2009-12-101-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is one merging snapshot and other non-merging snapshots, snapshot_merge_process() must make exceptions in the non-merging snapshots. Use a sequence count to resolve the race between I/O to chunks that are about to be merged. The count increases each time an exception reallocation finishes. Use wait_event() to wait until the count changes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: delay merging a chunk until writes to it completeMikulas Patocka2009-12-101-1/+5
| | | | | | | | | | | | | | | | | | Track writes to chunks that are currently being merged and delay merging a chunk until all writes to that chunk finish. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: queue writes to chunks being mergedMikulas Patocka2009-12-101-13/+78
| | | | | | | | | | | | | | | | | | While a set of chunks is being merged, any overlapping writes need to be queued. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: add mergingMikulas Patocka2009-12-102-6/+244
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merging is started when origin is resumed and it is stopped when origin is suspended or when the merging snapshot is destroyed or errors are detected. Merging is not yet interlocked with writes: this will be handled in subsequent patches. The code relies on callbacks from a private kcopyd thread. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: permit only one merge at onceMikulas Patocka2009-12-101-6/+27
| | | | | | | | | | | | | | | | | | Merging more than one snapshot is not supported, so prevent this happening. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: support barriers in snapshot merge targetMike Snitzer2009-12-101-3/+18
| | | | | | | | | | | | | | | | | | | | | | Sets num_flush_requests=2 to support flushing both the origin and cow devices used by the snapshot-merge target. Also, snapshot_ctr() now gets the origin device using FMODE_WRITE if the target is snapshot-merge (which writes to the origin device). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: avoid allocating exceptions in mergeMikulas Patocka2009-12-101-1/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snapshot-merge target should not allocate new exceptions because the intent is to merge all of its exceptions as quickly and safely as possible. This patch introduces the snapshot-merge mapping function and updates __origin_write() so that it doesn't allocate exceptions on any snapshots that are being merged. If a write request to a merging snapshot device is to be dispatched directly to the origin (because the chunk is not remapped or was already merged), snapshot_merge_map() must make exceptions in other snapshots so calls do_origin(). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: rework writing to originMikulas Patocka2009-12-101-106/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To track the completion of exceptions relating to the same location on the device, the current code selects one exception as primary_pe, links the other exceptions to it and uses reference counting to wait until all the reallocations are complete. It is considered too complicated to extend this code to handle the new snapshot-merge target, where sets of non-overlapping chunks would also need to become linked. Instead, a simpler (but less efficient) approach is taken. Bios are linked to one exception. When it completes, bios are simply retried, and if other related exceptions are still outstanding, they'll get queued again to wait for another one. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: add merge targetMikulas Patocka2009-12-101-12/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snapshot-merge target allows a snapshot to be merged back into the snapshot's origin device. One anticipated use of snapshot merging is the rollback of filesystems to back out problematic system upgrades. This patch adds snapshot-merge target management to both dm_snapshot_init() and dm_snapshot_exit(). As an initial place-holder, snapshot-merge is identical to the snapshot target. Documentation is provided. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm exception store: add merge specific methodsMikulas Patocka2009-12-102-2/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add functions that decide how many consecutive chunks of snapshot to merge back into the origin next and to update the metadata afterwards. prepare_merge provides a pointer to the most recent still-to-be-merged chunk and returns how many previous ones are consecutive and can be processed together. commit_merge removes the nr_merged most-recent chunks permanently from the exception store. The number must not exceed that returned by prepare_merge. Introduce NUM_SNAPSHOT_HDR_CHUNKS to show where the snapshot header chunk is accounted for. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: create function for chunk_is_tracked waitMike Snitzer2009-12-101-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | Move the __chunk_is_tracked() loop into a separate function as we will also need to call it from the write path in the rare case of conflicting writes to the same chunk. Originally introduced in commit a8d41b59f3f5a7ac19452ef442a7fc1b5fa17366 ("dm snapshot: fix race during exception creation"). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: make bio optional in __origin_writeMikulas Patocka2009-12-101-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | To support the merging of snapshots back into their origin we need to trigger exceptions in other snapshots not being merged without any incoming bio on the origin device. The bio parameter to __origin_write() becomes optional and the sector needs supplying separately. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm mpath: reject messages when device is suspendedKiyoshi Ueda2009-12-101-0/+5
| | | | | | | | | | | | | | | | | | | | This patch rejects messages that can generate I/O while the device itself is suspended. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: export suspended state to targetsKiyoshi Ueda2009-12-101-0/+11
| | | | | | | | | | | | | | | | | | | | This patch adds the exported dm_suspended() function so that targets can check whether or not they are suspended. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: rename dm_suspended to dm_suspended_mdKiyoshi Ueda2009-12-104-11/+16
| | | | | | | | | | | | | | | | | | | | | | This patch renames dm_suspended() to dm_suspended_md() and keeps it internal to dm. No functional change. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: swap target postsuspend call and setting suspended flagKiyoshi Ueda2009-12-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | This patch moves DMF_SUSPENDED flag set before postsuspend. No one should care about the ordering, because the flag set and the postsuspend are protected by a single lock, md->suspend_lock, and all strict flag-checkers take the lock. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm crypt: add plain64 ivMilan Broz2009-12-101-0/+18
| | | | | | | | | | | | | | | | | | | | The default plain IV is 32-bit only. This plain64 IV provides a compatible mode for encrypted devices bigger than 4TB. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: trace request based remappingJun'ichi Nomura2009-12-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a remapping trace to request-based dm. BIO-based dm already has the equivalent tracepoint. For example, under this dm stack (linear LV on multipath): # dmsetup ls --tree -o ascii vg-lv0 (253:1) `-mpath0 (253:0) |- (8:160) |- (66:80) |- (65:176) `- (65:160) Trace of 'dd of=/dev/vg/lv0 bs=128k count=1 oflag=direct' looks like this: without the patch: dd-6674 [000] 539.727384: block_bio_queue: 253,1 WS 0 + 256 [dd] dd-6674 [000] 539.727392: block_remap: 253,0 WS 384 + 256 <- (253,1) 0 dd-6674 [000] 539.727394: block_bio_queue: 253,0 WS 384 + 256 [dd] dd-6674 [000] 539.727405: block_getrq: 253,0 WS 384 + 256 [dd] dd-6674 [000] 539.727409: block_plug: [dd] dd-6674 [000] 539.727410: block_rq_insert: 253,0 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727416: block_rq_issue: 253,0 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727426: block_rq_insert: 65,176 W 0 () 384 + 256 [dd] dd-6674 [000] 539.727427: block_rq_issue: 65,176 W 0 () 384 + 256 [dd] ... and with the patch: (the line with '**' is the trace added by this patch) dd-6617 [002] 162.914301: block_bio_queue: 253,1 WS 0 + 256 [dd] dd-6617 [002] 162.914314: block_remap: 253,0 WS 384 + 256 <- (253,1) 0 dd-6617 [002] 162.914316: block_bio_queue: 253,0 WS 384 + 256 [dd] dd-6617 [002] 162.914331: block_getrq: 253,0 WS 384 + 256 [dd] dd-6617 [002] 162.914335: block_plug: [dd] dd-6617 [002] 162.914337: block_rq_insert: 253,0 W 0 () 384 + 256 [dd] dd-6617 [002] 162.914347: block_rq_issue: 253,0 W 0 () 384 + 256 [dd] **dd-6617 [002] 162.914356: block_rq_remap: 65,176 W 384 + 256 <- (253,0) 384 dd-6617 [002] 162.914358: block_rq_insert: 65,176 W 0 () 384 + 256 [dd] dd-6617 [002] 162.914359: block_rq_issue: 65,176 W 0 () 384 + 256 [dd] ... Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: allow live exception store handover between tablesMike Snitzer2009-12-101-27/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Permit in-use snapshot exception data to be 'handed over' from one snapshot instance to another. This is a pre-requisite for patches that allow the changes made in a snapshot device to be merged back into its origin device and also allows device resizing. The basic call sequence is: dmsetup load new_snapshot (referencing the existing in-use cow device) - the ctr code detects that the cow is already in use and allows the two snapshot target instances to be linked together dmsetup suspend original_snapshot dmsetup resume new_snapshot - the new_snapshot becomes live, and if anything now tries to access the original one it will receive -EIO dmsetup remove original_snapshot (There can only be two snapshot targets referencing the same cow device simultaneously.) Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: keep old table until after resume succeededAlasdair G Kergon2009-12-102-21/+23
| | | | | | | | | | | | | | | | | | | | | | | | When swapping a new table into place, retain the old table until its replacement is in place. An old check for an empty table is removed because this is enforced in populate_table(). __unbind() becomes redundant when followed by __bind(). Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: bind new table before destroying oldAlasdair G Kergon2009-12-102-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When replacing a mapped device's table during a 'resume', delay the destruction of the old table until the new one is successfully in place. This will make it easier for a later patch to transfer internal state information from the old table to the new one (something we do not currently support) while giving us more options for reversion if a later part of the operation fails. Devices are always in the suspended state during dm_swap_table(). This patch reinforces the requirement that all I/O must have been flushed from the table targets while in this state (including any in workqueues). In the case of 'noflush' suspending, unprocessed I/O should have been 'pushed back' to the dm core prior to this point, for resubmission after the new table is in place. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: retrieve status from inactive tableMike Snitzer2009-12-101-13/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | Add the flag DM_QUERY_INACTIVE_TABLE_FLAG to the ioctls to return infomation about the loaded-but-not-yet-active table instead of the live table. Prior to this patch it was impossible to obtain this information until the device had been 'resumed'. Userspace dmsetup and libdevmapper support the flag as of version 1.02.40. e.g. dmsetup info --inactive vg1-lv1 Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm io: handle empty barriersMikulas Patocka2009-12-101-3/+7
| | | | | | | | | | | | | | | | | | | | Accept empty barriers in dm-io. dm-io will process empty write barrier requests just like the other read/write requests. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm mpath: prevent io from work queue while suspendedMike Anderson2009-12-101-0/+12
| | | | | | | | | | | | | | | | | | Reject messages that can generate I/O while the device itself is suspended. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm mpath: add mutex to synchronize adding and flushing workMike Anderson2009-12-101-21/+38
| | | | | | | | | | | | | | | | | | Add a mutex to allow possible creators of new work to synchronize with flushing work queues. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Acked-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm ioctl: forbid messages to devices being deletedMike Anderson2009-12-101-0/+6
| | | | | | | | | | | | | | | | Once we begin deleting a device, prevent any further messages being sent to targets of its table (to avoid races). Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: add dm_deleting_md functionMike Anderson2009-12-102-2/+12
| | | | | | | | | | | | | | | | Add dm_deleting_md to check whether or not a given mapped device is currently being deleted. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm mpath: flush workqueues before suspend completesKiyoshi Ueda2009-12-101-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch stops the remaining dm-mpath activity during the suspend sequence by flushing workqueues in postsuspend function. The current dm-mpath target may not be quiet even after suspend completes because some workqueues (e.g. device_handler's work, event handling) are not flushed during the suspend sequence, even though suspended devices/targets are supposed to be quiet in this state. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: rename dm_get_table to dm_get_live_tableAlasdair G Kergon2009-12-102-19/+19
| | | | | | | | | | | | Rename dm_get_table to dm_get_live_table. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: add request based barrier supportKiyoshi Ueda2009-12-101-18/+196
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds barrier support for request-based dm. CORE DESIGN The design is basically same as bio-based dm, which emulates barrier by mapping empty barrier bios before/after a barrier I/O. But request-based dm has been using struct request_queue for I/O queueing, so the block-layer's barrier mechanism can be used. o Summary of the block-layer's behavior (which is depended by dm-core) Request-based dm uses QUEUE_ORDERED_DRAIN_FLUSH ordered mode for I/O barrier. It means that when an I/O requiring barrier is found in the request_queue, the block-layer makes pre-flush request and post-flush request just before and just after the I/O respectively. After the ordered sequence starts, the block-layer waits for all in-flight I/Os to complete, then gives drivers the pre-flush request, the barrier I/O and the post-flush request one by one. It means that the request_queue is stopped automatically by the block-layer until drivers complete each sequence. o dm-core For the barrier I/O, treats it as a normal I/O, so no additional code is needed. For the pre/post-flush request, flushes caches by the followings: 1. Make the number of empty barrier requests required by target's num_flush_requests, and map them (dm_rq_barrier()). 2. Waits for the mapped barriers to complete (dm_rq_barrier()). If error has occurred, save the error value to md->barrier_error (dm_end_request()). (*) Basically, the first reported error is taken. But -EOPNOTSUPP supersedes any error and DM_ENDIO_REQUEUE follows. 3. Requeue the pre/post-flush request if the error value is DM_ENDIO_REQUEUE. Otherwise, completes with the error value (dm_rq_barrier_work()). The pre/post-flush work above is done in the kernel thread (kdmflush) context, since memory allocation which might sleep is needed in dm_rq_barrier() but sleep is not allowed in dm_request_fn(), which is an irq-disabled context. Also, clones of the pre/post-flush request share an original, so such clones can't be completed using the softirq context. Instead, complete them in the context of underlying device drivers. It should be safe since there is no I/O dispatching during the completion of such clones. For suspend, the workqueue of kdmflush needs to be flushed after the request_queue has been stopped. Otherwise, the next flush work can be kicked even after the suspend completes. TARGET INTERFACE No new interface is added. Just use the existing num_flush_requests in struct target_type as same as bio-based dm. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: move dm_end_requestKiyoshi Ueda2009-12-101-31/+31
| | | | | | | | | | | | | | | | | | This patch moves dm_end_request() to make the next patch more readable. No functional change. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: refactor request based completion functionsKiyoshi Ueda2009-12-101-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch factors out the clone completion code, dm_done(), from dm_softirq_done() in preparation for a subsequent patch. No functional change. dm_done() will be used in barrier completion, which can't use and doesn't need softirq. The softirq_done callback needs to get a clone from an original request but it can't in the case of barrier, where an original request is shared by multiple clones. On the other hand, the completion of barrier clones doesn't involve re-submitting requests, which was the primary reason of the need for softirq. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: use md pending for in flight IO countingKiyoshi Ueda2009-12-101-28/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the counter for the number of in_flight I/Os to md->pending from q->in_flight in preparation for a later patch. No functional change. Request-based dm used q->in_flight to count the number of in-flight clones assuming the counter is always incremented for an in-flight original request and original:clone is 1:1 relationship. However, it this no longer true for barrier requests. So use md->pending to count the number of in-flight clones. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: simplify request based suspendKiyoshi Ueda2009-12-101-144/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The semantics of bio-based dm were changed recently in the case of suspend with "--nolockfs" but without "--noflush". Before 2.6.30, I/Os submitted before the suspend invocation were always flushed. From 2.6.30 onwards, I/Os submitted before the suspend invocation might not be flushed. (For details, see http://marc.info/?t=123994433400003&r=1&w=2) This patch brings the behaviour of request-based dm into line with bio-based dm, simplifying the code and preparing for a subsequent patch that will wait for all in_flight I/Os to complete without stopping request_queue and use dm_wait_for_completion() for it. This change in semantics simplifies the suspend code as follows: o Suspend is implemented as stopping request_queue in request-based dm, and all I/Os are queued in the request_queue even after suspend is invoked. o In the old semantics, we had to track whether I/Os were queued before or after the suspend invocation, so a special barrier-like request called 'suspend marker' was introduced. o With the new semantics, we don't need to flush any I/O so we can remove the marker and the code related to the marker handling and I/O flushing. After removing this codes, the suspend sequence is now: 1. Flush all I/Os by lock_fs() if needed. 2. Stop dispatching any I/O by stopping the request_queue. 3. Wait for all in-flight I/Os to be completed or requeued. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: abstract clone_rqKiyoshi Ueda2009-12-101-17/+28
| | | | | | | | | | | | | | | | | | | | | | | | This patch factors out the request cloning code in dm_prep_fn() as clone_rq(). No functional change. This patch is a preparation for a later patch in this series which needs to make clones from an original barrier request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: pass gfp_mask to alloc_rq_tioKiyoshi Ueda2009-12-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the gfp_mask argument to alloc_rq_tio(). No functional change. This patch is a preparation for a later patch in this series which needs to allocate tio (for barrier I/O) with different allocation flag (GFP_NOIO) from the one in the normal I/O code path. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: use clone in map_request functionKiyoshi Ueda2009-12-101-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | This patch changes the argument of map_request() to clone request from original request. No functional change. This patch is a preparation for PATCH 9, which needs to use map_request() for clones sharing an original barrier request. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm: abstract dm_in_flight functionKiyoshi Ueda2009-12-101-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds md_in_flight() to get the number of in_flight I/Os. No functional change. This patch is a preparation for a later patch in this series, which changes I/O counter to md->pending from q->in_flight in request-based dm. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm kcopyd: accept zero size jobsMikulas Patocka2009-12-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-kcopyd: accept zero-size jobs This patch changes dm-kcopyd so that it accepts zero-size jobs and completes them immediatelly via its completion thread. It is needed for multisnapshots snapshot resizing. When we are writing to a chunk beyond origin end, no copying is done. To simplify the code, we submit an empty request to kcopyd and let kcopyd complete it. If we didn't submit a request to kcopyd and called the completion routine immediatelly, it would violate the principle that completion is called only from one thread and it would need additional locking. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: track suspended state in targetMike Snitzer2009-12-101-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of whether or not the device is suspended within the snapshot target module, the same as we do in dm-raid1. We will use this later to enforce the correct sequence of ioctls to transfer the in-core exceptions from a snapshot target instance in one table to a replacement one capable of merging them back into the origin. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * dm snapshot: move cow ref from exception store to snap coreMike Snitzer2009-12-105-57/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Store the reference to the snapshot cow device in the core snapshot code instead of each exception store. It can be accessed through the new function dm_snap_cow(). Exception stores should each now maintain a reference to their parent snapshot struct. This is cleaner and makes part of the forthcoming snapshot merge code simpler. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Cc: Mikulas Patocka <mpatocka@redhat.com>
OpenPOWER on IntegriCloud