op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	dm cache: set minimum_io_size to cache's data block size	Mike Snitzer	2014-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, if the block layer's limit stacking didn't establish an optimal_io_size that was compatible with the cache's data block size we'd set optimal_io_size to the data block size and minimum_io_size to 0 (which the block layer adjusts to be physical_block_size). Update cache_io_hints() to set both minimum_io_size and optimal_io_size to the cache's data block size. This fixes an issue where mkfs.xfs would create more XFS Allocation Groups on cache volumes than on a normal linear LV of comparable size. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache metadata: use dm-space-map-metadata.h defined size limits	Mike Snitzer	2014-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7d48935e cleaned up the persistent-data's space-map-metadata limits by elevating them to dm-space-map-metadata.h. Update dm-cache-metadata to use these same limits. The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the sizeof the disk_bitmap_header. So the supported maximum metadata size is a bit smaller (reduced from 33423360 to 33292800 sectors). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
*	dm cache: fail migrations in the do_worker error path	Joe Thornber	2014-08-01	1	-0/+1
\| \| \| \| \|	Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: simplify deferred set reference count increments	Joe Thornber	2014-08-01	1	-46/+77
\| \| \| \| \| \| \| \| \| \| \|	Factor out inc_and_issue and inc_ds helpers to simplify deferred set reference count increments. Also cleanup cache_map to consistently call cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED. No functional change. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: fix race affecting dirty block count	Anssi Hannula	2014-08-01	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: always split discards on cache block boundaries	Heinz Mauelshagen	2014-05-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	The DM cache target cannot cope with discards that span multiple cache blocks, so each discard bio that spans more than one cache block must get split by the DM core. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.9+
*	dm cache: fix writethrough mode quiescing in cache_map	Mike Snitzer	2014-05-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Commit 2ee57d58735 ("dm cache: add passthrough mode") inadvertently removed the deferred set reference that was taken in cache_map()'s writethrough mode support. Restore taking this reference. This issue was found with code inspection. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.13+
*	dm cache: fix a lock-inversion	Joe Thornber	2014-04-04	1	-26/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When suspending a cache the policy is walked and the individual policy hints written to the metadata via sync_metadata(). This led to this lock order: policy->lock cache_metadata->root_lock When loading the cache target the policy is populated while the metadata lock is held: cache_metadata->root_lock policy->lock Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by ensuring the cache_metadata root_lock is held whilst all the hints are written, rather than being repeatedly locked while policy->lock is held (as was the case with each callout that policy_walk_mappings() made to the old save_hint() method). Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove locking correctness") build option. However, it is not clear how the LOCKDEP reported paths can lead to a deadlock since the two paths, suspending a target and loading a target, never occur at the same time. But that doesn't mean the same lock-inversion couldn't have occurred elsewhere. Reported-by: Marian Csontos <mcsontos@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: remove remainder of distinct discard block size	Heinz Mauelshagen	2014-03-27	1	-46/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	Discard block size not being equal to cache block size causes data corruption by erroneously avoiding migrations in issue_copy() because the discard state is being cleared for a group of cache blocks when it should not. Completely remove all code that enabled a distinction between the cache block size and discard block size. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: prevent corruption caused by discard_block_size > cache_block_size	Mike Snitzer	2014-03-27	1	-34/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the discard block size is larger than the cache block size we will not properly quiesce IO to a region that is about to be discarded. This results in a race between a cache migration where no copy is needed, and a write to an adjacent cache block that's within the same large discard block. Workaround this by limiting the discard_block_size to cache_block_size. Also limit the max_discard_sectors to cache_block_size. A more comprehensive fix that introduces range locking support in the bio_prison and proper quiescing of a discard range that spans multiple cache blocks is already in development. Reported-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: fix access beyond end of origin device	Heinz Mauelshagen	2014-03-12	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to avoid wasting cache space a partial block at the end of the origin device is not cached. Unfortunately, the check for such a partial block at the end of the origin device was flawed. Fix accesses beyond the end of the origin device that occured due to attempted promotion of an undetected partial block by: - initializing the per bio data struct to allow cache_end_io to work properly - recognizing access to the partial block at the end of the origin device - avoiding out of bounds access to the discard bitset Otherwise, users can experience errors like the following: attempt to access beyond end of device dm-5: rw=0, want=20971520, limit=20971456 ... device-mapper: cache: promotion failed; couldn't copy block Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: fix truncation bug when copying a block to/from >2TB fast device	Heinz Mauelshagen	2014-03-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During demotion or promotion to a cache's >2TB fast device we must not truncate the cache block's associated sector to 32bits. The 32bit temporary result of from_cblock() caused a 32bit multiplication when calculating the sector of the fast device in issue_copy_real(). Use an intermediate 64bit type to store the 32bit from_cblock() to allow for proper 64bit multiplication. Here is an example of how this bug manifests on an ext4 filesystem: EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt. JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: fix truncation bug when mapping I/O to >2TB fast device	Heinz Mauelshagen	2014-02-28	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When remapping a block to the cache's fast device that is larger than 2TB we must not truncate the destination sector to 32bits. The 32bit temporary result of from_cblock() was being overflowed in remap_to_cache() due to the logical left shift. Use an intermediate 64bit type to store the 32bit from_cblock() result to fix the overflow. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
*	dm cache: do not add migration to completed list before unhooking bio	Mike Snitzer	2014-02-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When completing an overwrite bio, in overwrite_endio(), the associated migration should not be added to the 'completed_migrations' until the bio's fields are restored with dm_unhook_bio(). Otherwise, do_worker() can race to process 'completed_migrations' before dm_unhook_bio() -- so the bio's bi_end_io is incorrect. This is unlikely to cause any problems given the current code but should be fixed on the basis of correctness. Also, the cache's spinlock only needs to be held when manipulating the 'completed_migrations' list -- other changes don't need protection. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
*	dm cache: move hook_info into common portion of per_bio_data structure	Mike Snitzer	2014-02-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c9d28d5d ("dm cache: promotion optimisation for writes") incorrectly placed the 'hook_info' member in the writethrough-only portion of the per_bio_data structure. Given that the overwrite optimization may be used for writeback the 'hook_info' member must be placed above the 'cache' member of the per_bio_data structure. Any members above 'cache' are available from both writeback and writethrough modes' per_bio_data structure. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.13+
*	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	Linus Torvalds	2014-01-30	1	-10/+18
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
\| *	Merge tag 'v3.13-rc6' into for-3.14/core	Jens Axboe	2013-12-31	1	-1/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
\| * \|	dm cache: increment bi_remaining when bi_end_io is restored	Mike Snitzer	2013-12-03	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the bio->bi_remaining increment into dm_unhook_bio() so the overwrite_endio() handler works as expected. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \|	block: Generic bio chaining	Kent Overstreet	2013-11-23	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a generic mechanism for chaining bio completions. This is going to be used for a bio_split() replacement, and it turns out to be very useful in a fair amount of driver code - a fair number of drivers were implementing this in their own roundabout ways, often painfully. Note that this means it's no longer to call bio_endio() more than once on the same bio! This can cause problems for drivers that save/restore bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all - in all but the simplest cases they'd be better off just cloning the bio, and immutable biovecs is making bio cloning cheaper. But for now, we add a bio_endio_nodec() for these cases. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk>
\| * \|	block: Abstract out bvec iterator	Kent Overstreet	2013-11-23	1	-10/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
* \| \|	dm cache: add policy name to status output	Mike Snitzer	2014-01-16	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cache's policy may have been established using the "default" alias, which is currently the "mq" policy but the default policy may change in the future. It is useful to know exactly which policy is being used. Add a 'real' member to the dm_cache_policy_type structure and have the "default" dm_cache_policy_type point to the real "mq" dm_cache_policy_type. Update dm_cache_policy_get_name() to check if real is set, if so report the name of the real policy (not the alias). Requested-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* \| \|	dm cache: add block sizes and total cache blocks to status output	Mike Snitzer	2014-01-10	1	-6/+10
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve cache_status to emit: <metadata block size> <#used metadata blocks>/<#total metadata blocks> <cache block size> <#used cache blocks>/<#total cache blocks> ... Adding the block sizes allows for easier calculation of the overall size of both the metadata and cache devices. Adding <#total cache blocks> provides useful context for how much of the cache is used. Unfortunately these additions to the status will require updates to users' scripts that monitor the cache status. But these changes help provide more comprehensive information about the cache device and will simplify tools that are being developed to manage dm-cache devices -- because they won't need to issue 3 operations to cobble together the information that we can easily provide via a single status ioctl. While updating the status documentation in cache.txt spaces were tabify'd. Requested-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
* \|	dm cache: actually resize cache	Vincent Pelletier	2013-12-10	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f494a9c6b1b6dd9a9f21bbb75d9210d478eeb498 ("dm cache: cache shrinking support") broke cache resizing support. dm_cache_resize() is called with cache->cache_size before it gets updated to new_size, so it is a no-op. But the dm-cache superblock is updated with the new_size even though the backing dm-array is not resized. Fix this by passing the new_size to dm_cache_resize(). Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: resolve small nits and improve Documentation	Mike Snitzer	2013-11-12	1	-1/+1
\| \| \| \| \| \| \| \|	Document passthrough mode, cache shrinking, and cache invalidation. Also, use strcasecmp() and hlist_unhashed(). Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: add cache block invalidation support	Joe Thornber	2013-11-11	1	-3/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cache block invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the 'invalidate_cblocks' message, which takes an arbitrary number of cblock ranges: invalidate_cblocks [<cblock>\|<cblock begin>-<cblock end>]* E.g. dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: add passthrough mode	Joe Thornber	2013-11-11	1	-35/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: cache shrinking support	Joe Thornber	2013-11-11	1	-9/+54
\| \| \| \| \| \| \| \|	Allow a cache to shrink if the blocks being removed from the cache are not dirty. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: promotion optimisation for writes	Joe Thornber	2013-11-09	1	-6/+87
\| \| \| \| \| \| \| \| \| \| \| \|	If a write block triggers promotion and covers a whole block we can avoid a copy. Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio fields (bi_private is now used by overwrite). Switch writethrough support over to using these helpers too. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: optimize commit_if_needed	Heinz Mauelshagen	2013-11-09	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Check commit_requested flag _before_ calling dm_cache_changed_this_transaction() superfluously. Also, be sure to set last_commit_jiffies _after_ dm_cache_commit() completes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: log error message if dm_kcopyd_copy() fails	Heinz Mauelshagen	2013-11-09	1	-1/+3
\| \| \| \| \| \| \| \|	A migration failure should be logged (albeit limited). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: use cell_defer() boolean argument consistently	Heinz Mauelshagen	2013-11-09	1	-4/+4
\| \| \| \| \| \| \| \|	Fix a few cell_defer() calls that weren't passing a bool. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: return -EINVAL if the user specifies unknown cache policy	Mikulas Patocka	2013-11-09	1	-6/+7
\| \| \| \| \| \| \| \|	Return -EINVAL when the specified cache policy is unknown rather than returning -ENOMEM. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: improve efficiency of quiescing flag management	Joe Thornber	2013-11-09	1	-22/+5
\| \| \| \| \| \| \| \|	Make the quiescing flag an atomic_t and stop protecting it with a spin lock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: fix a race condition between queuing new migrations and quiescing ↵	Joe Thornber	2013-11-09	1	-14/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for a shutdown The code that was trying to do this was inadequate. The postsuspend method (in ioctl context), needs to wait for the worker thread to acknowledge the request to quiesce. Otherwise the migration count may drop to zero temporarily before the worker thread realises we're quiescing. In this case the target will be taken down, but the worker thread may have issued a new migration, which will cause an oops when it completes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
*	dm cache: io destined for the cache device can now serve as tick bios	Joe Thornber	2013-11-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Previously only origin bios could trigger ticks, which meant if all the io was destined for the cache no ticks were generated. If no ticks are generated then multiple hits, and movements in general, are attributed to the same tick. Only a stop gap fix, we need a better solution. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*	dm cache: eliminate holes in cache structure	Mike Snitzer	2013-08-23	1	-15/+15
\| \| \| \| \| \| \| \| \| \|	Reorder members in the cache structure to eliminate 6 out of 7 holes (reclaiming 24 bytes). Also, the 'worker' and 'waker' members no longer straddle cachelines. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
*	dm cache: fix stacking of geometry limits	Mike Snitzer	2013-08-23	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not blindly override the queue limits (specifically io_min and io_opt). Allow traditional stacking of these limits if io_opt is a factor of the cache's data block size. Without this patch mkfs.xfs does not recognize the cache device's provided limits as a useful geometry (e.g. raid) so these hints are ignored. This was due to setting io_min to a useless value. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
*	dm cache: add data block size limits to code and Documentation	Mike Snitzer	2013-08-23	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Place upper bound on the cache's data block size (1GB). Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 32KB and 1GB. Also, it should be a multiple of 32KB. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
*	dm cache: fix arm link errors with inline	Mikulas Patocka	2013-07-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Use __always_inline to avoid a link failure with gcc 4.6 on ARM. gcc 4.7 is OK. It creates a function block_div.part.8, it references __udivdi3 and __umoddi3 and it is never called. The references to __udivdi3 and __umoddi3 cause a link failure. Reported-by: Rob Herring <robherring2@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: set config value	Joe Thornber	2013-05-10	1	-28/+31
\| \| \| \| \| \| \| \|	Share configuration option processing code between the dm cache ctr and message functions. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: move config fns	Alasdair G Kergon	2013-05-10	1	-17/+17
\| \| \| \| \| \| \|	Move process_config_option() in dm-cache-target.c to make the next patch more readable. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: replace memcpy with struct assignment	Joe Thornber	2013-05-10	1	-1/+1
\| \| \| \| \| \| \|	Use struct assignment rather than memcpy in dm cache. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: fix typos in comments	Joe Thornber	2013-05-10	1	-3/+4
\| \| \| \| \| \| \|	Fix up some typos in dm-cache comments. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: tune migration throttling	Joe Thornber	2013-05-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Tune the dm cache migration throttling. i) Issue a tick every second, just in case there's no i/o going through. ii) Drop the migration threshold right down to something suitable for background work. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: fix error return code in cache_create	Wei Yongjun	2013-05-10	1	-0/+1
\| \| \| \| \| \| \| \| \|	Return -ENOMEM if memory allocation fails in cache_create instead of 0 (to avoid NULL pointer dereference). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: reduce bio front_pad size in writeback mode	Mike Snitzer	2013-04-05	1	-13/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent patch to fix the dm cache target's writethrough mode extended the bio's front_pad to include a 1056-byte struct dm_bio_details. Writeback mode doesn't need this, so this patch reduces the per_bio_data_size to 16 bytes in this case instead of 1096. The dm_bio_details structure was added in "dm cache: fix writes to cache device in writethrough mode" which fixed commit e2e74d617e ("dm cache: fix race in writethrough implementation"). In writeback mode we avoid allocating the writethrough-specific members of the per_bio_data structure (the dm_bio_details structure included). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: fix writes to cache device in writethrough mode	Darrick J. Wong	2013-04-05	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dm-cache writethrough strategy introduced by commit e2e74d617eadc15 ("dm cache: fix race in writethrough implementation") issues a bio to the origin device, remaps and then issues the bio to the cache device. This more conservative in-series approach was selected to favor correctness over performance (of the previous parallel writethrough). However, this in-series implementation that reuses the same bio to write both the origin and cache device didn't take into account that the block layer's req_bio_endio() modifies a completing bio's bi_sector and bi_size. So the new writethrough strategy needs to preserve these bio fields, and restore them before submission to the cache device, otherwise nothing gets written to the cache (because bi_size is 0). This patch adds a struct dm_bio_details field to struct per_bio_data, and uses dm_bio_record() and dm_bio_restore() to ensure the bio is restored before reissuing to the cache device. Adding such a large structure to the per_bio_data is not ideal but we can improve this later, for now correctness is the important thing. This problem initially went unnoticed because the dm-cache test-suite uses a linear DM device for the dm-cache device's origin device. Writethrough worked as expected because DM submits a clone of the original bio, so the original bio which was reused for the cache was never touched. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: policy ignore hints if generated by different version	Mike Snitzer	2013-03-20	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reading the dm cache metadata from disk, ignore the policy hints unless they were generated by the same major version number of the same policy module. The hints are considered to be private data belonging to the specific module that generated them and there is no requirement for them to make sense to different versions of the policy that generated them. Policy modules are all required to work fine if no previous hints are supplied (or if existing hints are lost). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: fix race in writethrough implementation	Joe Thornber	2013-03-20	1	-50/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have found a race in the optimisation used in the dm cache writethrough implementation. Currently, dm core sends the cache target two bios, one for the origin device and one for the cache device and these are processed in parallel. This patch avoids the race by changing the code back to a simpler (slower) implementation which processes the two writes in series, one after the other, until we can develop a complete fix for the problem. When the cache is in writethrough mode it needs to send WRITE bios to both the origin and cache devices. Previously we've been implementing this by having dm core query the cache target on every write to find out how many copies of the bio it wants. The cache will ask for two bios if the block is in the cache, and one otherwise. Then main problem with this is it's racey. At the time this check is made the bio hasn't yet been submitted and so isn't being taken into account when quiescing a block for migration (promotion or demotion). This means a single bio may be submitted when two were needed because the block has since been promoted to the cache (catastrophic), or two bios where only one is needed (harmless). I really don't want to start entering bios into the quiescing system (deferred_set) in the get_num_write_bios callback. Instead this patch simplifies things; only one bio is submitted by the core, this is first written to the origin and then the cache device in series. Obviously this will have a latency impact. deferred_writethrough_bios is introduced to record bios that must be later issued to the cache device from the worker thread. This deferred submission, after the origin bio completes, is required given that we're in interrupt context (writethrough_endio). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
*	dm cache: avoid calling policy destructor twice on error	Heinz Mauelshagen	2013-03-20	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	If the cache policy's config values are not able to be set we must set the policy to NULL after destroying it in create_cache_policy() so we don't attempt to destroy it a second time later. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>