summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: truncate pages from clone ioctl target rangeSage Weil2011-08-161-0/+4
| | | | | | | | | | We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix uninitialized sync_pendingMiao Xie2011-08-161-1/+1
| | | | | | | sync_pending is uninitialized before it be used, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix wrong free space informationMiao Xie2011-08-161-5/+11
| | | | | | | | | | | | | Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: memory leak in btrfs_add_inode_defrag()Dan Carpenter2011-08-161-0/+2
| | | | | | | We don't use the defrag struct on this path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use plain page_address() in header fields setget functionsLi Zefan2011-08-161-4/+2
| | | | | | | We've stopped using highmem for extent buffers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: forced readonly when btrfs_drop_snapshot() failsTsutomu Itoh2011-08-162-10/+16
| | | | | | | | | | The filesystem turns readonly instead of returning the error to the caller when detected error in btrfs_drop_snapshot(). and, because the caller doesn't check the error, the function type is changed to 'void'. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: check if there is enough space for balancing smarterliubo2011-08-161-6/+35
| | | | | | | | | | When checking if there is enough space for balancing a block group, since we do not take raid types into consideration, we do not account corrent amounts of space that we needed. This makes us do some extra work before we get ENOSPC. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix a bug of balance on full multi-disk partitionsliubo2011-08-161-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix an oops of log replayliubo2011-08-161-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: detect wether a device supports discardJosef Bacik2011-08-163-2/+29
| | | | | | | | | | | | | | We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: force unplugs when switching from high to regular priority biosChris Mason2011-08-051-0/+17
| | | | | | | | | | | | Btrfs does bio submissions from a worker thread, and each device has a list of high priority bios and regular priority bios. Synchronous writes go to the high priority thread while async writes go to regular list. This commit brings back an explicit unplug any time we switch from high to regular priority, which makes it easier for the block layer to give us low latencies. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: don't call writepages from within write_full_pageJosef Bacik2011-08-011-10/+0
| | | | | | | | | | | When doing a writepage we call writepages to try and write out any other dirty pages in the area. This could cause problems where we commit a transaction and then have somebody else dirtying metadata in the area as we could end up writing out a lot more than we care about, which could cause latency on anybody who is waiting for the transaction to completely finish committing. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Remove unused variable 'last_index' in file.cMitch Harder2011-08-011-5/+2
| | | | | | | | | | | The variable 'last_index' is calculated in the __btrfs_buffered_write function and passed as a parameter to the prepare_pages function, but is not used anywhere in the prepare_pages function. Remove instances of 'last_index' in these functions. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up for find_first_extent_bit()Xiao Guangrong2011-08-011-40/+24
| | | | | | | | | find_first_extent_bit() and find_first_extent_bit_state() share most of the code, and we can just make the former call the latter. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up for wait_extent_bit()Xiao Guangrong2011-08-011-5/+1
| | | | | | | | We can just use cond_resched_lock(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up for insert_state()Xiao Guangrong2011-08-011-5/+5
| | | | | | | | Don't duplicate set_state_bits(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: remove unused members from struct extent_stateXiao Guangrong2011-08-011-2/+0
| | | | | | | | These members are not used at all. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up code for merging extent mapsLi Zefan2011-08-011-38/+21
| | | | | | | | unpin_extent_cache() and add_extent_mapping() shares the same code that merges extent maps. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up code for extent_map lookupLi Zefan2011-08-011-56/+29
| | | | | | | lookup_extent_map() and search_extent_map() can share most of code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: clean up search_extent_mapping()Li Zefan2011-08-011-14/+3
| | | | | | | | rb_node returned by __tree_search() can be a valid pointer or NULL, but won't be some errno. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: remove redundant code for dir item lookupLi Zefan2011-08-011-28/+2
| | | | | | | | | When we search a dir item with a specific hash code, we can just return NULL without further checking if btrfs_search_slot() returns 1. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: make acl functions really no-op if acl is not enabledLi Zefan2011-08-013-21/+15
| | | | | | | So there's no overhead for something we don't use. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: remove remaining ref-cache codeLi Zefan2011-08-012-120/+0
| | | | | | | | | Since commit f2a97a9dbd86eb1ef956bdf20e05c507b32beb96 ("btrfs: remove all unused functions"), there's no extern functions at all in ref-cache.c, so just remove the remaining dead code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: remove a BUG_ON() in btrfs_commit_transaction()Li Zefan2011-08-011-4/+2
| | | | | | | wait_for_commit() always returns 0. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: use wait_event()Li Zefan2011-08-011-52/+7
| | | | | | | Use wait_event() when possible to avoid code duplication. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: check the nodatasum flag when writing compressed filesLi Zefan2011-08-011-4/+10
| | | | | | | | | If mounting with nodatasum option, we won't csum file data for general write or direct-io write, and this rule should also be applied when writing compressed files. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: copy string correctly in INO_LOOKUP ioctlLi Zefan2011-08-011-2/+1
| | | | | | | | Memory areas [ptr, ptr+total_len] and [name, name+total_len] may overlap, so it's wrong to use memcpy(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: don't print the leaf if we had an errorJosef Bacik2011-08-011-1/+3
| | | | | | | | | | | | In __btrfs_free_extent we will print the leaf if we fail to find the extent we wanted, but the problem is if we get an error we won't have a leaf so often this leads to a NULL pointer dereference and we lose the error that actually occurred. So only print the leaf if ret > 0, which means we didn't find the item we were looking for but we didn't error either. This way the error is preserved. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: make btrfs_set_root_node voidMark Fasheh2011-08-012-5/+4
| | | | | | | | | | | This is fairly trivial - btrfs_set_root_node() - always returns zero so we can just make it void. All callers ignore the return code now anyway. I also made sure to check that none of the functions that btrfs_set_root_node() calls returns an error that we might have needed to catch and pass back. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix oops while writing data to SSD partitionsliubo2011-08-011-1/+3
| | | | | | | | | | | | | | | | | | Here I have a two SSD-partitions btrfs, and they are defaultly set to "data=raid0, metadata=raid1", then I try to fill my btrfs partition till "No space left on device", via "dd if=/dev/zero of=/mnt/btrfs/tmp". I get an oops panic from kernel BUG at fs/btrfs/extent-tree.c:5199!, which refers to find_free_extent's BUG_ON(index != get_block_group_index(block_group)); In SSD mode, in order to find enough space to alloc, we may check the block_group cache which has been checked sometime before, but the index is not updated, where it hits the BUG_ON. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Acked-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Protect the readonly flag of block groupWuBo2011-08-011-3/+7
| | | | | | | | The access for ro in btrfs_block_group_cache should be protected because of the racy lock in relocation. Signed-off-by: Wu Bo <wu.bo@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: Make extent-io callbacks that never fail return voidJeff Mahoney2011-08-013-62/+34
| | | | | | | | | | | | | | | The set/clear bit and the extent split/merge hooks only ever return 0. Changing them to return void simplifies the error handling cases later. This patch changes the hook prototypes, the single implementation of each, and the functions that call them to return void instead. Since all four of these hooks execute under a spinlock, they're necessarily simple. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix readahead in file defragLi Zefan2011-08-012-16/+8
| | | | | | | | We passed the wrong value to btrfs_force_ra(). Fix this by changing the argument of btrfs_force_ra() from last_index to nr_page. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: return error to caller when btrfs_unlink() failesTsutomu Itoh2011-08-012-4/+9
| | | | | | | | | When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink() are error, the error code is returned to the caller instead of BUG_ON(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs:don't check the return value of __btrfs_add_inode_defragWanlong Gao2011-08-011-6/+5
| | | | | | | | Don't need to check the return value of __btrfs_add_inode_defrag(), since it will always return 0. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge branch 'alloc_path' of ↵Chris Mason2011-08-016-29/+78
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling into for-linus
| * btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshotMark Fasheh2011-07-251-2/+6
| | | | | | | | | | | | | | In addition to properly handling allocation failure from btrfs_alloc_path, I also fixed up the kzalloc error handling code immediately below it. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: Don't BUG_ON alloc_path errors in find_next_chunkMark Fasheh2011-07-252-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | I also removed the BUG_ON from error return of find_next_chunk in init_first_rw_device(). It turns out that the only caller of init_first_rw_device() also BUGS on any nonzero return so no actual behavior change has occurred here. do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk() which can now return -ENOMEM. Instead of setting space_info->full on any error from btrfs_alloc_chunk() I catch and return every error value _except_ -ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: Don't BUG_ON alloc_path errors in btrfs_balance()Mark Fasheh2011-07-141-2/+4
| | | | | | | | | | | | | | | | | | Dealing with this seems trivial - the only caller of btrfs_balance() is btrfs_ioctl() which passes the error code directly back to userspace. There also isn't much state to unwind (if I'm wrong about this point, we can always safely move the allocation to the top of btrfs_balance() anyway). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: Don't BUG_ON alloc_path errors in btrfs_read_locked_inodeMark Fasheh2011-07-141-5/+17
| | | | | | | | | | | | | | btrfs_iget() also needed an update so that errors from btrfs_locked_inode() are caught and bubbled back up. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: Don't BUG_ON alloc_path errors in btrfs_truncate_inode_itemsMark Fasheh2011-07-141-4/+5
| | | | | | | | | | | | | | | | I moved the path allocation up a few lines to the top of the function so that we couldn't get into the state where we've dropped delayed items and the extent cache but fail due to -ENOMEM. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: Don't BUG_ON alloc_path errors in replay_one_buffer()Mark Fasheh2011-07-141-3/+9
| | | | | | | | | | | | | | The two ->process_func call sites in tree-log.c which were ignoring a return code have also been updated to gracefully exit as well. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
| * btrfs: don't BUG_ON btrfs_alloc_path() errorsMark Fasheh2011-07-144-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation failure. All the sites that are fixed in this patch were checked by me to be fairly trivial to fix because of at least one of two criteria: - Callers of the function catch errors from it already so bubbling the error up will be handled. - Callers of the function might BUG_ON any nonzero return code in which case there is no behavior changed (but we still got to remove a BUG_ON) The following functions were updated: btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group, btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written, btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink, insert_reserved_file_extent, and run_delalloc_nocow Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* | Merge branch 'integration' into for-linusChris Mason2011-07-2723-1010/+965
|\ \
| * | Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errorsChris Mason2011-07-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | The btrfs transaction code will return any errors that come from reserve_metadata_bytes. We need to make sure we don't return funny things like 1 or EAGAIN. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: use the commit_root for reading free_space_inode crcsChris Mason2011-07-273-19/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we are using regular file crcs for the free space cache, we can deadlock if we try to read the free_space_inode while we are updating the crc tree. This commit fixes things by using the commit_root to read the crcs. This is safe because we the free space cache file would already be loaded if that block group had been changed in the current transaction. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: reduce extent_state lock contention for metadataChris Mason2011-07-271-14/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For metadata buffers that don't straddle pages (all of them), btrfs can safely use the page uptodate bits and extent_buffer uptodate bit instead of needing to use the extent_state tree. This greatly reduces contention on the state tree lock. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: remove lockdep magic from btrfs_next_leafChris Mason2011-07-271-31/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Before the reader/writer locks, btrfs_next_leaf needed to keep the path blocking to avoid making lockdep upset. Now that btrfs_next_leaf only takes read locks, this isn't required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: make a lockdep class for each rootChris Mason2011-07-274-38/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch was originally from Tejun Heo. lockdep complains about the btrfs locking because we sometimes take btree locks from two different trees at the same time. The current classes are based only on level in the btree, which isn't enough information for lockdep to figure out if the lock is safe. This patch makes a class for each type of tree, and lumps all the FS trees that actually have files and directories into the same class. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Btrfs: switch the btrfs tree locks to reader/writerChris Mason2011-07-279-218/+431
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The btrfs metadata btree is the source of significant lock contention, especially in the root node. This commit changes our locking to use a reader/writer lock. The lock is built on top of rw spinlocks, and it extends the lock tracking to remember if we have a read lock or a write lock when we go to blocking. Atomics count the number of blocking readers or writers at any given time. It removes all of the adaptive spinning from the old code and uses only the spinning/blocking hints inside of btrfs to decide when it should continue spinning. In read heavy workloads this is dramatically faster. In write heavy workloads we're still faster because of less contention on the root node lock. We suffer slightly in dbench because we schedule more often during write locks, but all other benchmarks so far are improved. Signed-off-by: Chris Mason <chris.mason@oracle.com>
OpenPOWER on IntegriCloud