summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | Btrfs: do not use btrfs_end_transaction_throttle everywhereJosef Bacik2012-01-162-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A user reported a problem where things like open with O_CREAT would take up to 30 seconds when he had nfs activity on the same mount. This is because all of our quick metadata operations, like create, symlink etc all do btrfs_end_transaction_throttle, which if the transaction is blocked will wait for the commit to complete before it returns. This adds a ridiculous amount of latency and isn't really needed. The normal btrfs_end_transaction will mark the transaction as blocked and wake the transaction kthread up if it thinks the transaction needs to end (this being in the running out of global reserve space scenario), and this is all that is really needed since we've already done everything we're going to do, we just need to return. This should help people with the latency they were seeing when using synchronous heavy workloads. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | | | | | | Merge branch 'integrity-check-patch-v2' of ↵Chris Mason2012-01-1610-10/+3201
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://btrfs.giantdisaster.de/git/btrfs into integration Conflicts: fs/btrfs/ctree.h fs/btrfs/super.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | Btrfs: integrate integrity check module into btrfsStefan Behrens2011-12-216-11/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the last part of the patch series. It modifies the btrfs code to use the integrity check module if configured to do so with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set, the only effective change is that code is added that handles the mount option to activate the integrity check. If the mount option is set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code complains in the log and the mount fails with EINVAL. Add the mount option to activate the usage of the integrity check code. Add invocation of btrfs integrity check code init and cleanup function on mount and umount, respectively. Add hook to call btrfs integrity check code version of submit_bh/submit_bio. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| | * | | | | | | Btrfs: Makefile changes to optionally include btrfs integrity checkStefan Behrens2011-12-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the btrfs integrity check is enabled, the files required to implement the checks are included in the build. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| | * | | | | | | Btrfs: add config option to enable btrfs integrity checkStefan Behrens2011-12-211-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added the BTRFS_FS_CHECK_INTEGRITY option to Kconfig. It depends on BTRFS_FS. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| | * | | | | | | Btrfs: add optional integrity check codeStefan Behrens2011-12-212-0/+3104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two files added in this patch contain all the code that is required to implement the integrity checks. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
| * | | | | | | | Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason2012-01-1621-419/+1639
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | integration
| | * | | | | | | | Btrfs: make sure we're not using obsolete code in btrfs_get_extentJan Schmidt2012-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's code in btrfs_get_extent that should never be used. This patch turns a WARN_ON(1) into a BUG(), hoping we can remove the transaction code from btrfs_get_extent soon. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: new backref walking codeJan Schmidt2012-01-053-262/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old backref iteration code could only safely be used on commit roots. Besides this limitation, it had bugs in finding the roots for these references. This commit replaces large parts of it by btrfs_find_all_roots() which a) really finds all roots and the correct roots, b) works correctly under heavy file system load, c) considers delayed refs. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: added btrfs_find_all_roots()Jan Schmidt2012-01-042-0/+788
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function gets a byte number (a data extent), collects all the leafs pointing to it and walks up the trees to find all fs roots pointing to those leafs. It also returns the list of all leafs pointing to that extent. It does proper locking for the involved trees, can be used on busy file systems and honors delayed refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: add waitqueue instead of doing busy waiting for more delayed refsJan Schmidt2012-01-044-1/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we may be holding back delayed refs for a limited period, we might end up having no runnable delayed refs. Without this commit, we'd do busy waiting in that thread until another (runnable) ref arives. Instead, we're detecting this situation and use a waitqueue, such that we only try to run more refs after a) another runnable ref was added or b) delayed refs are no longer held back Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: put back delayed refs that are too newArne Jansen2012-01-042-23/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When processing a delayed ref, first check if there are still old refs in the process of being added. If so, put this ref back to the tree. To avoid looping on this ref, choose a newer one in the next loop. btrfs_find_ref_cluster has to take care of that. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: add sequence numbers to delayed refsArne Jansen2012-01-043-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sequence numbers are needed to reconstruct the backrefs of a given extent to a certain point in time. The total set of backrefs consist of the set of backrefs recorded on disk plus the enqueued delayed refs for it that existed at that moment. This patch also adds a list that records all delayed refs which are currently in the process of being added. When walking all refs of an extent in btrfs_find_all_roots(), we freeze the current state of delayed refs, honor anythinh up to this point and prevent processing newer delayed refs to assert consistency. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: add nested locking mode for pathsArne Jansen2012-01-043-2/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. btrfs_find_all_roots() needs this capability. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: always save ref_root in delayed refsArne Jansen2011-12-222-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistent backref walking and (later) qgroup calculation the information to which root a delayed ref belongs is useful even for shared refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: mark delayed refs as for cowArne Jansen2011-12-2212-114/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value from every call site. The for_cow parameter will later on be used to determine if a ref will change anything with respect to qgroups. Delayed refs coming from relocation are always counted as for_cow, as they don't change subvol quota. Also pass in the fs_info for later use. btrfs_find_all_roots() will use this as an optimization, as changes that are for_cow will not change anything with respect to which root points to a certain leaf. Thus, we don't need to add the current sequence number to those delayed refs. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: added helper btrfs_next_item()Jan Schmidt2011-12-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_next_item() makes the btrfs path point to the next item, crossing leaf boundaries if needed. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| | * | | | | | | | Btrfs: generic data structure to build unique listsArne Jansen2011-12-223-1/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ulist is a generic data structures to hold a collection of unique u64 values. The only operations it supports is adding to the list and enumerating it. It is possible to store an auxiliary value along with the key. The implementation is preliminary and can probably be sped up significantly. It is used by btrfs_find_all_roots() quota to translate recursions into iterative loops. Signed-off-by: Arne Jansen <sensille@gmx.net> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
| * | | | | | | | | Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integrationChris Mason2012-01-166-226/+280
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | Btrfs: fix possible deadlock when opening a seed deviceLi Zefan2012-01-112-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex, but when we mount a filesystem which has backing seed devices, we have this lock chain: open_ctree() lock(chunk_mutex); read_chunk_tree(); read_one_dev(); open_seed_devices(); lock(uuid_mutex); and then we hit a lockdep splat. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: update global block_rsv when creating a new block groupLi Zefan2012-01-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bug was triggered while using seed device: # mkfs.btrfs /dev/loop1 # btrfstune -S 1 /dev/loop1 # mount -o /dev/loop1 /mnt # btrfs dev add /dev/loop2 /mnt btrfs: block rsv returned -28 ------------[ cut here ]------------ WARNING: at fs/btrfs/extent-tree.c:5969 btrfs_alloc_free_block+0x166/0x396 [btrfs]() ... Call Trace: ... [<f7b7c31c>] btrfs_cow_block+0x101/0x147 [btrfs] [<f7b7eaa6>] btrfs_search_slot+0x1b8/0x55f [btrfs] [<f7b7f844>] btrfs_insert_empty_items+0x42/0x7f [btrfs] [<f7b7f8c1>] btrfs_insert_item+0x40/0x7e [btrfs] [<f7b8ac02>] btrfs_make_block_group+0x243/0x2aa [btrfs] [<f7bb3f53>] __btrfs_alloc_chunk+0x672/0x70e [btrfs] [<f7bb41ff>] init_first_rw_device+0x77/0x13c [btrfs] [<f7bb5a62>] btrfs_init_new_device+0x664/0x9fd [btrfs] [<f7bbb65a>] btrfs_ioctl+0x694/0xdbe [btrfs] [<c04f55f7>] do_vfs_ioctl+0x496/0x4cc [<c04f5660>] sys_ioctl+0x33/0x4f [<c07b9edf>] sysenter_do_call+0x12/0x38 ---[ end trace 906adac595facc7d ]--- Since seed device is readonly, there's no usable space in the filesystem. Afterwards we add a sprout device to it, and the kernel creates a METADATA block group and a SYSTEM block group where comes free space we can reserve, but we still get revervation failure because the global block_rsv hasn't been updated accordingly. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: rewrite btrfs_trim_block_group()Li Zefan2012-01-111-71/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are various bugs in block group trimming: - It may trim from offset smaller than user-specified offset. - It may trim beyond user-specified range. - It may leak free space for extents smaller than specified minlen. - It may truncate the last trimmed extent thus leak free space. - With mixed extents+bitmaps, some extents may not be trimmed. - With mixed extents+bitmaps, some bitmaps may not be trimmed (even none will be trimmed). Even for those trimmed, not all the free space in the bitmaps will be trimmed. I rewrite btrfs_trim_block_group() and break it into two functions. One is to trim extents only, and the other is to trim bitmaps only. Before patching: # fstrim -v /mnt/ /mnt/: 1496465408 bytes were trimmed After patching: # fstrim -v /mnt/ /mnt/: 2193768448 bytes were trimmed And this matches the total free space: # btrfs fi df /mnt Data: total=3.58GB, used=1.79GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=205.12MB, used=97.14MB Metadata: total=8.00MB, used=0.00 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: simplfy calculation of stripe length for discard operationLi Zefan2012-01-111-64/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For btrfs raid, while discarding a range of space, we'll need to know the start offset and length to discard for each device, and it's done in btrfs_map_block(). However the calculation is a bit complex for raid0 and raid10, so I reimplement it based on a fact that: dev1 dev2 dev3 (raid0) ----------------------------------- s0 s3 s6 s1 s4 s7 s2 s5 Each device has (total_stripes / nr_dev) stripes, or plus one. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: don't pre-allocate btrfs bioLi Zefan2012-01-111-46/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We pre-allocate a btrfs bio with fixed size, and then may re-allocate memory if we find stripes are bigger than the fixed size. But this pre-allocation is not necessary. Also we don't have to calcuate the stripe number twice. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: don't pass a trans handle unnecessarily in volumes.cLi Zefan2012-01-113-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some functions never use the transaction handle passed to them. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: reserve metadata space in btrfs_ioctl_setflags()Li Zefan2012-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check and reserve space for btrfs_update_inode(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: remove BUG_ON()s in btrfs_ioctl_setflags()Li Zefan2012-01-111-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can recover from errors and return -errno to user space. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: check the return value of io_ctl_init()Li Zefan2012-01-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can return -ENOMEM. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: avoid possible NULL deref in io_ctl_drop_pages()Li Zefan2012-01-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we run into some failure path in io_ctl_prepare_pages(), io_ctl->pages[] array may have some NULL pointers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Btrfs: add pinned extents to on-disk free space cache correctlyLi Zefan2012-01-111-21/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got this while running xfstests: [24256.836098] block group 317849600 has an wrong amount of free space [24256.836100] btrfs: failed to load free space cache for block group 317849600 We should clamp the extent returned by find_first_extent_bit(), so the start of the extent won't smaller than the start of the block group. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * | | | | | | | | Merge branch 'for-linus' of ↵Li Zefan2012-01-1133-2009/+6337
| | |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs into for-linus
| * | \ \ \ \ \ \ \ \ \ Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into ↵Chris Mason2012-01-168-98/+1385
| |\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | integration
| | * | | | | | | | | | | Btrfs: add balance progress reportingIlya Dryomov2012-01-164-5/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: allow for resuming restriper after it was pausedIlya Dryomov2012-01-161-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recognize BTRFS_BALANCE_RESUME flag passed from userspace. We use the same heuristics used when recovering balance after a crash to try to start where we left off last time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: allow for canceling restriperIlya Dryomov2012-01-166-3/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement an ioctl for canceling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Balance item is deleted and no memory about the interrupted balance is kept. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: allow for pausing restriperIlya Dryomov2012-01-166-3/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement an ioctl for pausing restriper. This pauses the relocation, but balance is still considered to be "in progress": balance item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to pause restriper and free its data structures on unmount. (It's safe to unmount when restriper is in "paused" state, we will resume with the same parameters on the next mount) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: add skip_balance mount optionIlya Dryomov2012-01-163-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since restriper kthread starts involuntarily on mount and can suck cpu and memory bandwidth add a mount option to forcefully skip it. The restriper in that case hangs around in paused state and can be resumed from userspace when it's convenient. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: recover balance on mountIlya Dryomov2012-01-163-2/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On mount, if balance item is found, resume balance in a separate kernel thread. Try to be smart to continue roughly where previous balance (or convert) was interrupted. For chunk types that were being converted to some profile we turn on soft convert, in case of a simple balance we turn on usage filter and relocate only less-than-90%-full chunks of that type. These are just heuristics but they help quite a bit, and can be improved in future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: save balance parameters to diskIlya Dryomov2012-01-162-1/+231
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new btree objectid for storing balance item. The reason is to be able to resume restriper after a crash with the same parameters. Balance item has a very high objectid and goes into tree of tree roots. The key for the new item is as follows: [ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ] Older kernels simply ignore it so it's safe to mount with an older kernel and then go back to the newer one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: soft profile changing mode (aka soft convert)Ilya Dryomov2012-01-162-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing convert from one profile to another if soft mode is on restriper won't touch chunks that already have the profile we are converting to. This is useful if e.g. half of the FS was converted earlier. The soft mode switch is (like every other filter) per-type. This means that we can convert for example meta chunks the "hard" way while converting data chunks selectively with soft switch. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: implement online profile changingIlya Dryomov2012-01-163-1/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Profile changing is done by launching a balance with BTRFS_BALANCE_CONVERT bits set and target fields of respective btrfs_balance_args structs initialized. Profile reducing code in this case will pick restriper's target profile if it's available instead of doing a blind reduce. If target profile is not yet available it goes back to a plain reduce. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: do not reduce profile in do_chunk_alloc()Ilya Dryomov2012-01-162-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every caller of do_chunk_alloc() feeds it the reduced allocation profile, so stop trying to reduce it one more time. Instead check the validity of the passed profile. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: virtual address space subset filterIlya Dryomov2012-01-162-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Select chunks which have at least one byte located inside a given [vstart, vend) virtual address space range. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: devid subset filterIlya Dryomov2012-01-162-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Select chunks which have at least one byte of at least one stripe located on a device with devid X in a given [pstart,pend) physical address range. This filter only works when devid filter is turned on. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: devid filterIlya Dryomov2012-01-162-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Relocate chunks which have at least one stripe located on a device with devid X. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: usage filterIlya Dryomov2012-01-162-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Select chunks that are less than X percent full. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: profiles filterIlya Dryomov2012-01-162-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Select chunks based on a given profile mask. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: add basic infrastructure for selective balancingIlya Dryomov2012-01-163-2/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows to have a separate set of filters for each chunk type (data,meta,sys). The code however is generic and switch on chunk type is only done once. This commit also adds a type filter: it allows to balance for example meta and system chunks w/o touching data ones. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: add basic restriper infrastructureIlya Dryomov2012-01-166-41/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic restriper infrastructure: extended balancing ioctl and all related ioctl data structures, add data structure for tracking restriper's state to fs_info, etc. The semantics of the old balancing ioctl are fully preserved. Explicitly disallow any volume operations when balance is in progress. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| | * | | | | | | | | | | Btrfs: make avail_*_alloc_bits fields dynamicIlya Dryomov2012-01-161-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when new chunks are created respective avail_alloc_bits field is updated to reflect profiles of all chunks present in the system. However when chunks are removed profile bits are never cleared. This patch clears profile bit of respective avail_alloc_bits field when the last chunk with that profile is removed. Restriper needs this to properly operate when "downgrading". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
OpenPOWER on IntegriCloud