op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'md/3.14-fixes' of git://neil.brown.name/md	Linus Torvalds	2014-02-14	2	-49/+54
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull md fixes from Neil Brown: "Two bugfixes for md both tagged for -stable" * tag 'md/3.14-fixes' of git://neil.brown.name/md: md/raid5: Fix CPU hotplug callback registration md/raid1: restore ability for check and repair to fix read errors.
\| *	md/raid5: Fix CPU hotplug callback registration	Oleg Nesterov	2014-02-13	1	-46/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Interestingly, the raid5 code can actually prevent double initialization and hence can use the following simplified form of callback registration: register_cpu_notifier(&foobar_cpu_notifier); get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); put_online_cpus(); A hotplug operation that occurs between registering the notifier and calling get_online_cpus(), won't disrupt anything, because the code takes care to perform the memory allocations only once. So reorganize the code in raid5 this way to fix the deadlock with callback registration. Cc: linux-raid@vger.kernel.org Cc: stable@vger.kernel.org (v2.6.32+) Fixes: 36d1c6476be51101778882897b315bd928c8c7b5 Signed-off-by: Oleg Nesterov <oleg@redhat.com> [Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the free_scratch_buffer() helper to condense code further and wrote the changelog.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de>
\| *	md/raid1: restore ability for check and repair to fix read errors.	NeilBrown	2014-02-05	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 30bc9b53878a9921b02e3b5bc4283ac1c6de102a md/raid1: fix bio handling problems in process_checks() Move the bio_reset() to a point before where BIO_UPTODATE is checked, so that check now always report that the bio is uptodate, even if it is not. This causes process_check() to sometimes treat read-errors as successful matches so the good data isn't written out. This patch preserves the flag until it is needed. Bug was introduced in 3.11, but backported to 3.10-stable (as it fixed an even worse bug). So suitable for any -stable since 3.10. Reported-and-tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: stable@vger.kernel.org (3.10+) Fixed: 30bc9b53878a9921b02e3b5bc4283ac1c6de102a Signed-off-by: NeilBrown <neilb@suse.de>
* \|	Merge branch 'bcache-for-3.14' of git://evilpiepirate.org/~kent/linux-bcache ↵	Jens Axboe	2014-01-30	6	-10/+15
\|\ \ \| \|/ \|/\| \| \|	into for-linus
\| *	bcache: bugfix - gc thread now gets woken when cache is full	Nicholas Swenson	2014-01-29	1	-3/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Swenson <nks@daterainc.com>
\| *	bcache: Minor fixes from kbuild robot	Kent Overstreet	2014-01-29	4	-5/+8
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: fix BUG_ON due to integer overflow with GC_SECTORS_USED	Darrick J. Wong	2014-01-29	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BUG_ON at the end of __bch_btree_mark_key can be triggered due to an integer overflow error: BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13); ... SET_GC_SECTORS_USED(g, min_t(unsigned, GC_SECTORS_USED(g) + KEY_SIZE(k), (1 << 14) - 1)); BUG_ON(!GC_SECTORS_USED(g)); In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide. While the SET_ code tries to ensure that the field doesn't overflow by clamping it to (1<<14)-1 == 16383, this is incorrect because 16383 requires 14 bits. Therefore, if GC_SECTORS_USED() + KEY_SIZE() = 8192, the SET_ statement tries to store 8192 into a 13-bit field. In a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON. Therefore, create a field width constant and a max value constant, and use those to create the bitfield and check the inputs to SET_GC_SECTORS_USED. Arguably the BITMASK() template ought to have BUG_ON checks for too-large values, but that's a separate patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* \|	Merge branch 'for-3.14/drivers' of git://git.kernel.dk/linux-block	Linus Torvalds	2014-01-30	22	-1755/+2213
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block IO driver changes from Jens Axboe: - bcache update from Kent Overstreet. - two bcache fixes from Nicholas Swenson. - cciss pci init error fix from Andrew. - underflow fix in the parallel IDE pg_write code from Dan Carpenter. I'm sure the 1 (or 0) users of that are now happy. - two PCI related fixes for sx8 from Jingoo Han. - floppy init fix for first block read from Jiri Kosina. - pktcdvd error return miss fix from Julia Lawall. - removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from Michael Opdenacker. - comment typo fix for the loop driver from Olaf Hering. - potential oops fix for null_blk from Raghavendra K T. - two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing an OOM problem and a problem with handling security locked conditions * 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits) mg_disk: Spelling s/finised/finished/ null_blk: Null pointer deference problem in alloc_page_buffers mtip32xx: Correctly handle security locked condition mtip32xx: Make SGL container per-command to eliminate high order dma allocation drivers/block/loop.c: fix comment typo in loop_config_discard drivers/block/cciss.c:cciss_init_one(): use proper errnos drivers/block/paride/pg.c: underflow bug in pg_write() drivers/block/sx8.c: remove unnecessary pci_set_drvdata() drivers/block/sx8.c: use module_pci_driver() floppy: bail out in open() if drive is not responding to block0 read bcache: Fix auxiliary search trees for key size > cacheline size bcache: Don't return -EINTR when insert finished bcache: Improve bucket_prio() calculation bcache: Add bch_bkey_equal_header() bcache: update bch_bkey_try_merge bcache: Move insert_fixup() to btree_keys_ops bcache: Convert sorting to btree_keys bcache: Convert debug code to btree_keys bcache: Convert btree_iter to struct btree_keys bcache: Refactor bset_tree sysfs stats ...
\| *	bcache: Fix auxiliary search trees for key size > cacheline size	Kent Overstreet	2014-01-08	1	-14/+14
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Don't return -EINTR when insert finished	Kent Overstreet	2014-01-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to return -EINTR after a split because we invalidated iterators (and freed the btree node) - but if we were finished inserting, we don't want to redo the traversal. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Improve bucket_prio() calculation	Kent Overstreet	2014-01-08	2	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When deciding what order to reuse buckets we take into account both the bucket's priority (which indicates lru order) and also the amount of live data in that bucket. The way they were scaled together wasn't as correct as it could be... this patch improves and documents it. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Add bch_bkey_equal_header()	Nicholas Swenson	2014-01-08	3	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checks if two keys have equivalent header fields. (good enough for replacement or merging) Used in bch_bkey_try_merge, and replacing a key in the btree. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: update bch_bkey_try_merge	Nicholas Swenson	2014-01-08	3	-16/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added generic header checks to bch_bkey_try_merge, which then calls the bkey specific function Removed extraneous checks from bch_extent_merge Signed-off-by: Nicholas Swenson <nks@daterainc.com>
\| *	bcache: Move insert_fixup() to btree_keys_ops	Kent Overstreet	2014-01-08	4	-229/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now handling overlapping extents/keys is a method that's specific to what the btree node contains. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Convert sorting to btree_keys	Kent Overstreet	2014-01-08	3	-36/+33
\| \| \| \| \| \| \| \| \| \| \| \|	More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Convert debug code to btree_keys	Kent Overstreet	2014-01-08	9	-217/+264
\| \| \| \| \| \| \| \| \| \| \| \|	More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Convert btree_iter to struct btree_keys	Kent Overstreet	2014-01-08	6	-38/+41
\| \| \| \| \| \| \| \| \| \| \| \|	More work to disentangle bset.c from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Refactor bset_tree sysfs stats	Kent Overstreet	2014-01-08	3	-47/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're in the process of turning bset.c into library code, so none of the code in that file should know about struct cache_set or struct btree - so, move the btree traversal part of the stats code to sysfs.c. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Add bch_btree_keys_u64s_remaining()	Kent Overstreet	2014-01-08	2	-13/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Helper function to explicitly check how much space is free in a btree node Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Add struct btree_keys	Kent Overstreet	2014-01-08	8	-263/+322
\| \| \| \| \| \| \| \| \| \| \| \|	Soon, bset.c won't need to depend on struct btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Abstract out stuff needed for sorting	Kent Overstreet	2014-01-08	9	-289/+423
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Rename/shuffle various code around	Kent Overstreet	2014-01-08	8	-276/+341
\| \| \| \| \| \| \| \| \| \| \| \|	More work to disentangle bset.c from the rest of the code: Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Add struct bset_sort_state	Kent Overstreet	2014-01-08	6	-49/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	More disentangling bset.c from the rest of the bcache code - soon, the sorting routines won't have any dependencies on any outside structs. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Split out sort_extent_cmp()	Kent Overstreet	2014-01-08	4	-32/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only use extent comparison for comparing extents, so we're not using START_KEY() on other key types (i.e. btree pointers) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Bkey indexing renaming	Kent Overstreet	2014-01-08	6	-52/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	More refactoring: node() -> bset_bkey_idx() end() -> bset_bkey_last() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Make bch_keylist_realloc() take u64s, not nptrs	Kent Overstreet	2014-01-08	4	-16/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic 2s Also - split out the part that checks against journal entry size so as to avoid a dependancy on struct cache_set in bset.c Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Remove/fix some header dependencies	Kent Overstreet	2014-01-08	3	-24/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the process of disentagling/libraryizing bset.c from the rest of the bcache code. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Use a mempool for mergesort temporary space	Kent Overstreet	2014-01-08	3	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was a single element mempool before, it's slightly cleaner to just use a real mempool. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Btree verify code improvements	Kent Overstreet	2014-01-08	6	-40/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Used this fixed code to find and fix the bug fixed by a4d885097b0ac0cd1337f171f2d4b83e946094d4. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: kill index()	Kent Overstreet	2014-01-08	4	-8/+24
\| \| \| \| \| \| \| \| \| \| \| \|	That was a terrible name for a macro, add some better helpers to replace it. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Trivial error handling fix	Kent Overstreet	2014-01-08	1	-1/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache/md: Use raid stripe size	Kent Overstreet	2014-01-08	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we've got code for raid5/6 stripe awareness, bcache just needs to know about the stripes and when writing partial stripes is expensive - we probably don't want to enable this optimization for raid1 or 10, even though they have stripes. So add a flag to queue_limits. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Do bkey_put() in btree_split() error path	Kent Overstreet	2014-01-08	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This error path shouldn't have been hit in practice.. and we've got reworked reserve code coming soon so that it shouldn't _ever_ be bit... but if we've got code for this error path it should be correct. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Rework allocator reserves	Kent Overstreet	2014-01-08	7	-79/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need a reserve for allocating buckets for new btree nodes - and now that we've got multiple btrees, it really needs to be per btree. This reworks the reserves so we've got separate freelists for each reserve instead of watermarks, which seems to make things a bit cleaner, and it adds some code so that btree_split() can make sure the reserve is available before it starts. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: kill closure locking code	Kent Overstreet	2014-01-08	2	-313/+123
\| \| \| \| \| \| \| \| \| \| \| \|	Also flesh out the documentation a bit Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: kill closure locking usage	Kent Overstreet	2014-01-08	7	-55/+98
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Zero less memory	Kent Overstreet	2014-01-08	3	-40/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Another minor performance optimization Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Don't touch bucket gen for dirty ptrs	Kent Overstreet	2014-01-08	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unnecessary since a bucket that has dirty pointers pointing to it can never be invalidated - and skipping it is a measurable performance boost, since the bucket gen will usually be a cache miss. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Minor btree cache fix	Kent Overstreet	2014-01-08	1	-7/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Performance fix for when journal entry is full	Kent Overstreet	2014-01-08	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were unnecessarily waiting on a journal write to complete when we just needed to start a journal write and start setting up the next one. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Minor journal fix	Kent Overstreet	2014-01-08	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The real fix is where we check the bytes we need against how much is remaining - we also need to check for a journal entry bigger than our buffer, we'll never write those and it would be bad if we tried to read one. Also improve the diagnostic messages. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| *	bcache: Data corruption fix	Kent Overstreet	2014-01-08	1	-4/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code that handles overlapping extents that we've just read back in from disk was depending on the behaviour of the code that handles overlapping extents as we're inserting into a btree node in the case of an insert that forced an existing extent to be split: on insert, if we had to split we'd also insert a new extent to represent the top part of the old extent - and then that new extent would get written out. The code that read the extents back in thus not bother with splitting extents - if it saw an extent that ovelapped in the middle of an older extent, it would trim the old extent to only represent the bottom part, assuming that the original insert would've inserted a new extent to represent the top part. I still haven't figured out _how_ it can happen, but I'm now pretty convinced (and testing has confirmed) that there's some kind of an obscure corner case (probably involving extent merging, and multiple overwrites in different sets) that breaks this. The fix is to change the mergesort fixup code to split extents itself when required. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
* \|	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	Linus Torvalds	2014-01-30	36	-936/+569
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
\| *	Merge tag 'v3.13-rc6' into for-3.14/core	Jens Axboe	2013-12-31	26	-142/+308
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
\| * \|	dm cache: increment bi_remaining when bi_end_io is restored	Mike Snitzer	2013-12-03	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the bio->bi_remaining increment into dm_unhook_bio() so the overwrite_endio() handler works as expected. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \|	block: Introduce new bio_split()	Kent Overstreet	2013-11-23	6	-250/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new bio_split() can split arbitrary bios - it's not restricted to single page bios, like the old bio_split() (previously renamed to bio_pair_split()). It also has different semantics - it doesn't allocate a struct bio_pair, leaving it up to the caller to handle completions. Then convert the existing bio_pair_split() users to the new bio_split() - and also nvme, which was open coding bio splitting. (We have to take that BUG_ON() out of bio_integrity_trim() because this bio_split() needs to use it, and there's no reason it has to be used on bios marked as cloned; BIO_CLONED doesn't seem to have clearly documented semantics anyways.) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Neil Brown <neilb@suse.de>
\| * \|	block: Rename bio_split() -> bio_pair_split()	Kent Overstreet	2013-11-23	3	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is prep work for introducing a more general bio_split(). Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Sage Weil <sage@inktank.com>
\| * \|	block: Generic bio chaining	Kent Overstreet	2013-11-23	5	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a generic mechanism for chaining bio completions. This is going to be used for a bio_split() replacement, and it turns out to be very useful in a fair amount of driver code - a fair number of drivers were implementing this in their own roundabout ways, often painfully. Note that this means it's no longer to call bio_endio() more than once on the same bio! This can cause problems for drivers that save/restore bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all - in all but the simplest cases they'd be better off just cloning the bio, and immutable biovecs is making bio cloning cheaper. But for now, we add a bio_endio_nodec() for these cases. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk>
\| * \|	block: Remove bi_idx hacks	Kent Overstreet	2013-11-23	1	-45/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that drivers have been converted to the new bvec_iter primitives, there's no need to trim the bvec before we submit it; and we can't trim it once we start sharing bvecs. It used to be that passing a partially completed bio (i.e. one with nonzero bi_idx) to generic_make_request() was a dangerous thing - various drivers would choke on such things. But with immutable biovecs and our new bio splitting that shares the biovecs, submitting partially completed bios has to work (and should work, now that all the drivers have been completed to the new primitives) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk>
\| * \|	dm: Refactor for new bio cloning/splitting	Kent Overstreet	2013-11-23	2	-179/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to convert the dm code to the new bvec_iter primitives which respect bi_bvec_done; they also allow us to drastically simplify dm's bio splitting code. Also, it's no longer necessary to save/restore the bvec array anymore - driver conversions for immutable bvecs are done, so drivers should never be modifying it. Also kill bio_sector_offset(), dm was the only user and it doesn't make much sense anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alasdair Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Reviewed-by: Mike Snitzer <snitzer@redhat.com>