summaryrefslogtreecommitdiffstats
path: root/drivers/md/bcache/bset.c
Commit message (Collapse)AuthorAgeFilesLines
* bcache: Minor fixes from kbuild robotKent Overstreet2014-01-291-2/+5
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix auxiliary search trees for key size > cacheline sizeKent Overstreet2014-01-081-14/+14
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add bch_bkey_equal_header()Nicholas Swenson2014-01-081-6/+2
| | | | | | | | | | | Checks if two keys have equivalent header fields. (good enough for replacement or merging) Used in bch_bkey_try_merge, and replacing a key in the btree. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: update bch_bkey_try_mergeNicholas Swenson2014-01-081-0/+27
| | | | | | | | | Added generic header checks to bch_bkey_try_merge, which then calls the bkey specific function Removed extraneous checks from bch_extent_merge Signed-off-by: Nicholas Swenson <nks@daterainc.com>
* bcache: Move insert_fixup() to btree_keys_opsKent Overstreet2014-01-081-0/+50
| | | | | | | Now handling overlapping extents/keys is a method that's specific to what the btree node contains. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert sorting to btree_keysKent Overstreet2014-01-081-27/+23
| | | | | | More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert debug code to btree_keysKent Overstreet2014-01-081-3/+114
| | | | | | More work to disentangle various code from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Convert btree_iter to struct btree_keysKent Overstreet2014-01-081-11/+11
| | | | | | More work to disentangle bset.c from struct btree Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Refactor bset_tree sysfs statsKent Overstreet2014-01-081-45/+4
| | | | | | | | We're in the process of turning bset.c into library code, so none of the code in that file should know about struct cache_set or struct btree - so, move the btree traversal part of the stats code to sysfs.c. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add struct btree_keysKent Overstreet2014-01-081-71/+108
| | | | | | Soon, bset.c won't need to depend on struct btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Abstract out stuff needed for sortingKent Overstreet2014-01-081-273/+6
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Rename/shuffle various code aroundKent Overstreet2014-01-081-23/+149
| | | | | | More work to disentangle bset.c from the rest of the code: Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add struct bset_sort_stateKent Overstreet2014-01-081-26/+42
| | | | | | | More disentangling bset.c from the rest of the bcache code - soon, the sorting routines won't have any dependencies on any outside structs. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Split out sort_extent_cmp()Kent Overstreet2014-01-081-21/+63
| | | | | | | Only use extent comparison for comparing extents, so we're not using START_KEY() on other key types (i.e. btree pointers) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Bkey indexing renamingKent Overstreet2014-01-081-14/+14
| | | | | | | | | More refactoring: node() -> bset_bkey_idx() end() -> bset_bkey_last() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Make bch_keylist_realloc() take u64s, not nptrsKent Overstreet2014-01-081-10/+2
| | | | | | | | | | Getting away from KEY_PTRS and moving toward KEY_U64s - and getting rid of magic 2s Also - split out the part that checks against journal entry size so as to avoid a dependancy on struct cache_set in bset.c Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Use a mempool for mergesort temporary spaceKent Overstreet2014-01-081-7/+5
| | | | | | | It was a single element mempool before, it's slightly cleaner to just use a real mempool. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Btree verify code improvementsKent Overstreet2014-01-081-3/+0
| | | | | | | Used this fixed code to find and fix the bug fixed by a4d885097b0ac0cd1337f171f2d4b83e946094d4. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Don't touch bucket gen for dirty ptrsKent Overstreet2014-01-081-1/+5
| | | | | | | | Unnecessary since a bucket that has dirty pointers pointing to it can never be invalidated - and skipping it is a measurable performance boost, since the bucket gen will usually be a cache miss. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Data corruption fixKent Overstreet2014-01-081-4/+22
| | | | | | | | | | | | | | | | | | | | | | | The code that handles overlapping extents that we've just read back in from disk was depending on the behaviour of the code that handles overlapping extents as we're inserting into a btree node in the case of an insert that forced an existing extent to be split: on insert, if we had to split we'd also insert a new extent to represent the top part of the old extent - and then that new extent would get written out. The code that read the extents back in thus not bother with splitting extents - if it saw an extent that ovelapped in the middle of an older extent, it would trim the old extent to only represent the bottom part, assuming that the original insert would've inserted a new extent to represent the top part. I still haven't figured out _how_ it can happen, but I'm now pretty convinced (and testing has confirmed) that there's some kind of an obscure corner case (probably involving extent merging, and multiple overwrites in different sets) that breaks this. The fix is to change the mergesort fixup code to split extents itself when required. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10
* bcache: Delete some slower inline asmKent Overstreet2013-11-101-8/+0
| | | | | | | | | | | Never saw a profile of bset_search_tree() where it wasn't bottlenecked on memory until I got my new Haswell machine, but when I tried it there it was suddenly burning 20% of the cpu in the inner loop on shrd... Turns out, the version of shrd that takes 64 bit operands has a 9 cycle latency. hah. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Move spinlock into struct time_statsKent Overstreet2013-11-101-6/+1
| | | | | | Minor cleanup. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill bch_next_recurse_key()Kent Overstreet2013-11-101-8/+0
| | | | | | This dates from before the btree iterator, and now it's finally gone Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: bch_(btree|extent)_ptr_invalid()Kent Overstreet2013-11-101-13/+36
| | | | | | | | | Trying to treat btree pointers and leaf node pointers the same way was a mistake - going to start being more explicit about the type of key/pointer we're dealing with. This is the first part of that refactoring; this patch shouldn't change any actual behaviour. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Debug code improvementsKent Overstreet2013-11-101-52/+60
| | | | | | | | | | | | | | | Couple changes: * Consolidate bch_check_keys() and bch_check_key_order(), and move the checks that only check_key_order() could do to bch_btree_iter_next(). * Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to flip on the EDEBUG checks at runtime. * Dropped an old not terribly useful check in rw_unlock(), and refactored/improved a some of the other debug code. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix bch_ptr_bad()Kent Overstreet2013-11-101-34/+33
| | | | | | | | | | | | | | | | Previously, bch_ptr_bad() could return false when there was a pointer to a nonexistant device... it only filtered out keys with PTR_CHECK_DEV pointers. This behaviour was intended for multiple cache device support; for that, just because the device for one of the pointers has gone away doesn't mean we want to filter out the rest of the pointers. But we don't yet explicitly filter/check individual pointers, so without that this behaviour was wrong - a corrupt bkey with a bad device pointer could cause us to deref a bad pointer. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Pull on disk data structures out into a separate headerKent Overstreet2013-11-101-2/+2
| | | | | | | Now, the on disk data structures are in a header that can be exported to userspace - and having them all centralized is nice too. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Kill op->clKent Overstreet2013-11-101-1/+1
| | | | | | | This isn't used for waiting asynchronously anymore - so this is a fairly trivial refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Prune struct btree_opKent Overstreet2013-11-101-1/+0
| | | | | | | Eventual goal is for struct btree_op to contain only what is necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add btree_map() functionsKent Overstreet2013-11-101-18/+15
| | | | | | | | | | | | | | Lots of stuff has been open coding its own btree traversal - which is generally pretty simple code, but there are a few subtleties. This adds new new functions, bch_btree_map_nodes() and bch_btree_map_keys(), which do the traversal for you. Everything that's open coding btree traversal now (with the exception of garbage collection) is slowly going to be converted to these two functions; being able to write other code at a higher level of abstraction is a big improvement w.r.t. overall code quality. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Clean up keylist codeKent Overstreet2013-11-101-29/+15
| | | | | | More random refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Add btree_insert_node()Kent Overstreet2013-11-101-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | The flow of control in the old btree insertion code was rather - backwards; we'd recurse down the btree (in btree_insert_recurse()), and then if we needed to split the keys to be inserted into the parent node would be effectively returned up to btree_insert_recurse(), which would notice there was more work to do and finish the insertion. The main problem with this was that the full logic for btree insertion could only be used by calling btree_insert_recurse; if you'd gotten to a btree leaf some other way and had a key to insert, if it turned out that node needed to be split you were SOL. This inverts the flow of control so btree_insert_node() does _full_ btree insertion, including splitting - and takes a (leaf) btree node to insert into as a parameter. This means we can now _correctly_ handle cache misses - for cache misses, we need to insert a fake "check" key into the btree when we discover we have a cache miss - while we still have the btree locked. Previously, if the btree node was full inserting a cache miss would just fail. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* bcache: Fix for handling overlapping extents when reading in a btree nodeKent Overstreet2013-09-241-11/+28
| | | | | | | | | | | | | btree_sort_fixup() was overly clever, because it was trying to avoid pulling a key off the btree iterator in more than one place. This led to a really obscure bug where we'd break early from the loop in btree_sort_fixup() if the current key overlapped with keys in more than one older set, and the next key it overlapped with was zero size. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-3.11/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2013-07-221-21/+35
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO driver bits from Jens Axboe: "As I mentioned in the core block pull request, due to real life circumstances the driver pull request would be late. Now it looks like -rc2 late... On the plus side, apart form the rsxx update, these are all things that I could argue could go in later in the cycle as they are fixes and not features. So even though things are late, it's not ALL bad. The pull request contains: - Updates to bcache, all bug fixes, from Kent. - A pile of drbd bug fixes (no big features this time!). - xen blk front/back fixes. - rsxx driver updates, some of them deferred form 3.10. So should be well cooked by now" * 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits) bcache: Allocation kthread fixes bcache: Fix GC_SECTORS_USED() calculation bcache: Journal replay fix bcache: Shutdown fix bcache: Fix a sysfs splat on shutdown bcache: Advertise that flushes are supported bcache: check for allocation failures bcache: Fix a dumb race bcache: Use standard utility code bcache: Update email address bcache: Delete fuzz tester bcache: Document shrinker reserve better bcache: FUA fixes drbd: Allow online change of al-stripes and al-stripe-size drbd: Constants should be UPPERCASE drbd: Ignore the exit code of a fence-peer handler if it returns too late drbd: Fix rcu_read_lock balance on error path drbd: fix error return code in drbd_init() drbd: Do not sleep inside rcu bcache: Refresh usage docs ...
| * bcache: Improve lazy sortingKent Overstreet2013-06-261-17/+23
| | | | | | | | | | | | | | | | | | The old lazy sorting code was kind of hacky - rewrite in a way that mathematically makes more sense; the idea is that the size of the sets of keys in a btree node should increase by a more or less fixed ratio from smallest to biggest. Signed-off-by: Kent Overstreet <koverstreet@google.com>
| * bcache: Rip out pkey()/pbtree()Kent Overstreet2013-06-261-4/+12
| | | | | | | | | | | | | | | | Old gcc doesnt like the struct hack, and it is kind of ugly. So finish off the work to convert pr_debug() statements to tracepoints, and delete pkey()/pbtree(). Signed-off-by: Kent Overstreet <koverstreet@google.com>
* | md: bcache: Fixed a typo with the word 'arithmetic'Phil Viana2013-06-181-2/+2
|/ | | | | | | The word 'arithmetic' was typed as 'arithmatic' Signed-off-by: Phil Viana <phillip.l.viana@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* bcache: Use WARN_ONCE() instead of __WARN()Kent Overstreet2013-04-081-1/+1
| | | | Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Add missing #include <linux/prefetch.h>Geert Uytterhoeven2013-04-081-0/+1
| | | | | | | | | | | | | m68k/allmodconfig: drivers/md/bcache/bset.c: In function ‘bset_search_tree’: drivers/md/bcache/bset.c:727: error: implicit declaration of function ‘prefetch’ drivers/md/bcache/btree.c: In function ‘bch_btree_node_get’: drivers/md/bcache/btree.c:933: error: implicit declaration of function ‘prefetch’ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Kent Overstreet <koverstreet@google.com>
* bcache: Don't export utility code, prefix with bch_Kent Overstreet2013-03-281-2/+2
| | | | | | Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: Style/checkpatch fixesKent Overstreet2013-03-251-4/+5
| | | | | | | | | Took out some nested functions, and fixed some more checkpatch complaints. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: Build fixes from test robotKent Overstreet2013-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | config: make ARCH=i386 allmodconfig All error/warnings: drivers/md/bcache/bset.c: In function 'bch_ptr_bad': >> drivers/md/bcache/bset.c:164:2: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/debug.c: In function 'bch_pbtree': >> drivers/md/bcache/debug.c:86:4: warning: format '%li' expects argument of type 'long int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/btree.c: In function 'bch_btree_read_done': >> drivers/md/bcache/btree.c:245:8: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' [-Wformat] -- drivers/md/bcache/closure.o: In function `closure_debug_init': >> (.init.text+0x0): multiple definition of `init_module' >> drivers/md/bcache/super.o:super.c:(.init.text+0x0): first defined here Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-bcache@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: A block layer cacheKent Overstreet2013-03-231-0/+1190
Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>
OpenPOWER on IntegriCloud