summaryrefslogtreecommitdiffstats
path: root/drivers/md/bcache/journal.c
Commit message (Collapse)AuthorAgeFilesLines
* bcache: remove driver private bio splitting codeKent Overstreet2015-08-131-2/+2
| | | | | | | | | | | | | The bcache driver has always accepted arbitrarily large bios and split them internally. Now that every driver must accept arbitrarily large bios this code isn't nessecary anymore. Cc: linux-bcache@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: add more description in commit message] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: add a bi_error field to struct bioChristoph Hellwig2015-07-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* bcache: don't embed 'return' statements in closure macrosJens Axboe2015-07-111-0/+2
| | | | | | | | This is horribly confusing, it breaks the flow of the code without it being apparent in the caller. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de>
* MAINTAINERS: BCACHE: Kent Overstreet has changed email addressJoe Perches2015-06-301-1/+1
| | | | | | | | | | | | Kent's email address in MAINTAINERS seems to be invalid. This was his last sign-off address, so use that if appropriate. Fix the S: status entry while there. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bcache: Fix an infinite loop in journal replayKent Overstreet2014-08-041-1/+4
| | | | | | | | | When running with multiple cache devices, if one of the devices has a completely empty journal but we'd already found some journal entries on a previosu device we'd go into an infinite loop. Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix to remove the rcu_sched stalls.Surbhi Palande2014-08-041-1/+2
| | | | | | | | while loop was executing infinitely. This fix ends the while loop gracefully. Signed-off-by: Surbhi Palande <sap@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a journal replay bugKent Overstreet2014-08-041-7/+9
| | | | | | | journal replay wansn't validating pointers with bch_extent_invalid() before derefing, fixed Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: btree locking reworkKent Overstreet2014-03-181-5/+4
| | | | | | | | | | | | | | | | | | Add a new lock, b->write_lock, which is required to actually modify - or write - a btree node; this lock is only held for short durations. This means we can write out a btree node without taking b->lock, which _is_ held for long durations - solving a deadlock when btree_flush_write() (from the journalling code) is called with a btree node locked. Right now just occurs in bch_btree_set_root(), but with an upcoming journalling rework is going to happen a lot more. This also turns b->lock is now more of a read/intent lock instead of a read/write lock - but not completely, since it still blocks readers. May turn it into a real intent lock at some point in the future. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add bch_keylist_init_single()Kent Overstreet2014-03-181-4/+1
| | | | | | | This will potentially save us an allocation when we've got inode/dirent bkeys that don't fit in the keylist's inline keys. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix another bug recovering from unclean shutdownKent Overstreet2014-03-181-13/+4
| | | | | | | | | | | The on disk bucket gens are allowed to be out of date, when we reuse buckets that didn't have any live data in them. To deal with this, the initial gc has to update the bucket gen when we find a pointer gen newer than the bucket's gen. Unfortunately we weren't doing this for pointers in the journal that we're about to replay. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a journalling reclaim after recovery bugKent Overstreet2014-03-181-2/+8
| | | | | | | | On recovery we weren't correctly keeping track of what journal buckets had open journal entries, thus it was possible for them to be overwritten until we'd written all new journal entries. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a null ptr deref in journal replayKent Overstreet2014-03-171-1/+5
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a shutdown bugKent Overstreet2014-02-251-2/+7
| | | | | | Shutdown wasn't cancelling/waiting on journal_write_work() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Rename/shuffle various code aroundKent Overstreet2014-01-081-4/+5
| | | | | | More work to disentangle bset.c from the rest of the code: Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Bkey indexing renamingKent Overstreet2014-01-081-3/+3
| | | | | | | | | More refactoring: node() -> bset_bkey_idx() end() -> bset_bkey_last() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: kill closure locking usageKent Overstreet2014-01-081-13/+14
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Performance fix for when journal entry is fullKent Overstreet2014-01-081-5/+9
| | | | | | | We were unnecessarily waiting on a journal write to complete when we just needed to start a journal write and start setting up the next one. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Minor journal fixKent Overstreet2014-01-081-5/+14
| | | | | | | | | | | The real fix is where we check the bytes we need against how much is remaining - we also need to check for a journal entry bigger than our buffer, we'll never write those and it would be bad if we tried to read one. Also improve the diagnostic messages. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* block: Abstract out bvec iteratorKent Overstreet2013-11-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
* bcache: Pull on disk data structures out into a separate headerKent Overstreet2013-11-101-2/+2
| | | | | | | Now, the on disk data structures are in a header that can be exported to userspace - and having them all centralized is nice too. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes()Kent Overstreet2013-11-101-3/+1
| | | | | | | | | Last of the btree_map() conversions. Main visible effect is bch_btree_insert() is no longer taking a struct btree_op as an argument anymore - there's no fancy state machine stuff going on, it's just a normal function. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill op->replaceKent Overstreet2013-11-101-1/+1
| | | | | | | | This is prep work for converting bch_btree_insert to bch_btree_map_leaf_nodes() - we have to convert all its arguments to actual arguments. Bunch of churn, but should be straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill op->clKent Overstreet2013-11-101-5/+3
| | | | | | | This isn't used for waiting asynchronously anymore - so this is a fairly trivial refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Prune struct btree_opKent Overstreet2013-11-101-15/+17
| | | | | | | Eventual goal is for struct btree_op to contain only what is necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert bch_btree_read_async() to bch_btree_map_keys()Kent Overstreet2013-11-101-1/+0
| | | | | | | | | This is a fairly straightforward conversion, mostly reshuffling - op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the code for handling cache hits and misses wasn't really btree code, so it gets moved to request.c. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Move keylist out of btree_opKent Overstreet2013-11-101-4/+7
| | | | | | | Slowly working on pruning struct btree_op - the aim is for it to only contain things that are actually necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Refactor journalling flow controlKent Overstreet2013-11-101-113/+100
| | | | | | | | | | Making things less asynchronous that don't need to be - bch_journal() only has to block when the journal or journal entry is full, which is emphatically not a fast path. So make it a normal function that just returns when it finishes, to make the code and control flow easier to follow. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Clean up keylist codeKent Overstreet2013-11-101-6/+8
| | | | | | More random refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add explicit keylist arg to btree_insert()Kent Overstreet2013-11-101-1/+1
| | | | | | | | Some refactoring - better to explicitly pass stuff around instead of having it all in the "big bag of state", struct btree_op. Going to prune struct btree_op quite a bit over time. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add on error panic/unregister settingKent Overstreet2013-11-101-4/+3
| | | | | | | Works kind of like the ext4 setting, to panic or remount read only on errors. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix a journalling performance bugKent Overstreet2013-11-101-21/+26
|
* bcache: Fix a flush/fua performance bugKent Overstreet2013-09-241-0/+1
| | | | | | | | | | | bch_journal_meta() was missing the flush to make the journal write actually go down (instead of waiting up to journal_delay_ms)... Whoops Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bcache: Fix for when no journal entries are foundKent Overstreet2013-09-241-12/+18
| | | | | | | | | The journal replay code didn't handle this case, causing it to go into an infinite loop... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bcache: Fix a dumb journal discard bugKent Overstreet2013-09-241-1/+1
| | | | | | | | | That switch statement was obviously wrong, leading to some sort of weird spinning on rare occasion with discards enabled... Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bcache: Journal replay fixKent Overstreet2013-07-121-1/+6
| | | | | | | | | | | | | | | | | The journal replay code starts by finding something that looks like a valid journal entry, then it does a binary search over the unchecked region of the journal for the journal entries with the highest sequence numbers. Trouble is, the logic was wrong - journal_read_bucket() returns true if it found journal entries we need, but if the range of journal entries we're looking for loops around the end of the journal - in that case journal_read_bucket() could return true when it hadn't found the highest sequence number we'd seen yet, and in that case the binary search did the wrong thing. Whoops. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
* bcache: FUA fixesKent Overstreet2013-07-011-1/+1
| | | | | | | Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node writes have... weird ordering requirements. Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Fix/revamp tracepointsKent Overstreet2013-06-261-4/+10
| | | | | | | | | | | | | | | | The tracepoints were reworked to be more sensible, and fixed a null pointer deref in one of the tracepoints. Converted some of the pr_debug()s to tracepoints - this is partly a performance optimization; it used to be that with DEBUG or CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it was changed to an empty inline function. Some of the pr_debug() statements had rather expensive function calls as part of the arguments, so this code was getting run unnecessarily even on non debug kernels - in some fast paths, too. Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Refactor btree ioKent Overstreet2013-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most significant change is that btree reads are now done synchronously, instead of asynchronously and doing the post read stuff from a workqueue. This was originally done because we can't block on IO under generic_make_request(). But - we already have a mechanism to punt cache lookups to workqueue if needed, so if we just use that we don't have to deal with the complexity of doing things asynchronously. The main benefit is this makes the locking situation saner; we can hold our write lock on the btree node until we're finished reading it, and we don't need that btree_node_read_done() flag anymore. Also, for writes, btree_write() was broken out into btree_node_write() and btree_leaf_dirty() - the old code with the boolean argument was dumb and confusing. The prio_blocked mechanism was improved a bit too, now the only counter is in struct btree_write, we don't mess with transfering a count from struct btree anymore. This required changing garbage collection to block prios at the start and unblock when it finishes, which is cleaner than what it was doing anyways (the old code had mostly the same effect, but was doing it in a convoluted way) And the btree iter btree_node_read_done() uses was converted to a real mempool. Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Sparse fixesKent Overstreet2013-04-081-0/+2
| | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Don't export utility code, prefix with bch_Kent Overstreet2013-03-281-2/+2
| | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: Style/checkpatch fixesKent Overstreet2013-03-251-4/+4
| | | | | | | | | Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: A block layer cacheKent Overstreet2013-03-231-0/+785
Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>
OpenPOWER on IntegriCloud