op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Btrfs: Remove broken optimisations in end_bio functions.	David Woodhouse	2008-09-25	1	-138/+21
\| \| \| \| \| \| \| \| \|	These ended up freeing objects while they were still using them. Under guidance from Chris, just rip out the 'clever' bits and do things the simple way. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Change TestSetPageLocked() to trylock_page()	David Woodhouse	2008-09-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Add backwards compatibility in compat.h Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> --- compat.h \| 3 +++ extent_io.c \| 3 ++- 2 files changed, 5 insertions(+), 1 deletions(-) Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Add compatibility for kernels >= 2.6.27-rc1	Sven Wegener	2008-09-25	1	-0/+23
\| \| \| \| \| \| \|	Add a couple of #if's to follow API changes. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: implement memory reclaim for leaf reference cache	Yan	2008-09-25	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Fix verify_parent_transid	Chris Mason	2008-09-25	1	-1/+4
\| \| \| \| \| \| \|	It was incorrectly clearing the up to date flag on the buffer even when the buffer properly verified. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Search data ordered extents first for checksums on read	Chris Mason	2008-09-25	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checksum items are not inserted into the tree until all of the io from a given extent is complete. This means one dirty page from an extent may be written, freed, and then read again before the entire extent is on disk and the checksum item is inserted. The checksums themselves are stored in the ordered extent so they can be inserted in bulk when IO is complete. On read, if a checksum item isn't found, the ordered extents were being searched for a checksum record. This all worked most of the time, but the checksum insertion code tries to reduce the number of tree operations by pre-inserting checksum items based on i_size and a few other factors. This means the read code might find a checksum item that hasn't yet really been filled in. This commit changes things to check the ordered extents first and only dive into the btree if nothing was found. This removes the need for extra locking and is more reliable. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Fix some data=ordered related data corruptions	Chris Mason	2008-09-25	1	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stress testing was showing data checksum errors, most of which were caused by a lookup bug in the extent_map tree. The tree was caching the last pointer returned, and searches would check the last pointer first. But, search callers also expect the search to return the very first matching extent in the range, which wasn't always true with the last pointer usage. For now, the code to cache the last return value is just removed. It is easy to fix, but I think lookups are rare enough that it isn't required anymore. This commit also replaces do_sync_mapping_range with a local copy of the related functions. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Use a mutex in the extent buffer for tree block locking	Chris Mason	2008-09-25	1	-0/+9
\| \| \| \| \| \| \| \| \|	This replaces the use of the page cache lock bit for locking, which wasn't suitable for block size < page size and couldn't be used recursively. The mutexes alone don't fix either problem, but they are the first step. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Index extent buffers in an rbtree	Chris Mason	2008-09-25	1	-193/+116
\| \| \| \| \| \| \| \| \| \| \| \|	Before, extent buffers were a temporary object, meant to map a number of pages at once and collect operations on them. But, a few extra fields have crept in, and they are also the best place to store a per-tree block lock field as well. This commit puts the extent buffers into an rbtree, and ensures a single extent buffer for each tree block. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Keep extent mappings in ram until pending ordered extents are done	Chris Mason	2008-09-25	1	-12/+15
\| \| \| \| \| \| \| \|	It was possible for stale mappings from disk to be used instead of the new pending ordered extent. This adds a flag to the extent map struct to keep it pinned until the pending ordered extent is actually on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Don't allow releasepage to succeed if EXTENT_ORDERED is set	Chris Mason	2008-09-25	1	-1/+2
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Use async helpers to deal with pages that have been improperly dirtied	Chris Mason	2008-09-25	1	-0/+10
\| \| \| \| \| \| \| \| \|	Higher layers sometimes call set_page_dirty without asking the filesystem to help. This causes many problems for the data=ordered and cow code. This commit detects pages that haven't been properly setup for IO and kicks off an async helper to deal with them. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: New data=ordered implementation	Chris Mason	2008-09-25	1	-7/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Change find_extent_buffer to use TestSetPageLocked	Chris Mason	2008-09-25	1	-1/+6
\| \| \| \| \| \| \|	This makes it possible for callers to check for extent_buffers in cache without deadlocking against any btree locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Start btree concurrency work.	Chris Mason	2008-09-25	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Fix corners in writepage and btrfs_truncate_page	Chris Mason	2008-09-25	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \|	The extent_io writepage calls needed an extra check for discarding pages that started on th last byte in the file. btrfs_truncate_page needed checks to make sure the page was still part of the file after reading it, and most importantly, needed to wait for all IO to the page to finish before freeing the corresponding extents on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Handle write errors on raid1 and raid10	Chris Mason	2008-09-25	1	-8/+41
\| \| \| \| \| \| \| \| \| \| \| \|	When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Drop some verbose printks	Chris Mason	2008-09-25	1	-8/+5
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: write_cache_pages came in 2.6.22	Chris Mason	2008-09-25	1	-1/+1
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: write_extent_pages came in 2.6.23	Chris Mason	2008-09-25	1	-1/+1
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT reads	Chris Mason	2008-09-25	1	-0/+1
\| \| \| \| \| \| \|	The generic O_DIRECT code assumes all the bios have the same bdev, which isn't true for multi-device btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Don't drop extent_map cache during releasepage on the btree inode	Chris Mason	2008-09-25	1	-11/+27
\| \| \| \| \| \| \|	The btree inode should only have a single extent_map in the cache, it doesn't make sense to ever drop it. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Remove bogus max_sector warnings from the extent_io code	Chris Mason	2008-09-25	1	-7/+0
\| \| \| \| \| \| \|	It was testing the bio before doing logical->physical mapping, so the test was always wrong. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Use the extent map cache to find the logical disk block during data ↵	Chris Mason	2008-09-25	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	retries The data read retry code needs to find the logical disk block before it can resubmit new bios. But, finding this block isn't allowed to take the fs_mutex because that will deadlock with a number of different callers. This changes the retry code to use the extent map cache instead, but that requires the extent map cache to have the extent we're looking for. This is a problem because btrfs_drop_extent_cache just drops the entire extent instead of the little tiny part it is invalidating. The bulk of the code in this patch changes btrfs_drop_extent_cache to invalidate only a portion of the extent cache, and changes btrfs_get_extent to deal with the results. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: define write_cache_pages for linux kernel <= 2.6.20 instead	Miguel	2008-09-25	1	-2/+1
\| \| \| \| \| \| \|	write_cache_pages doesn't exist in linux 2.6.20, change the #if condition to match that. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Handle checksumming errors while reading data blocks	Chris Mason	2008-09-25	1	-0/+9
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Retry metadata reads in the face of checksum failures	Chris Mason	2008-09-25	1	-21/+29
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Do metadata checksums for reads via a workqueue	Chris Mason	2008-09-25	1	-22/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before, metadata checksumming was done by the callers of read_tree_block, which would set EXTENT_CSUM bits in the extent tree to show that a given range of pages was already checksummed and didn't need to be verified again. But, those bits could go away via try_to_releasepage, and the end result was bogus checksum failures on pages that never left the cache. The new code validates checksums when the page is read. It is a little tricky because metadata blocks can span pages and a single read may end up going via multiple bios. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Add additional debugging for metadata checksum failures	Chris Mason	2008-09-25	1	-3/+51
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Correct usage of IS_ERR() in extent_io.c	Peter	2008-09-25	1	-9/+9
\| \| \| \| \|	Signed-off-by: Peter Teoh <htmldeveloper@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Add leak debugging for extent_buffer and extent_state	Chris Mason	2008-09-25	1	-2/+26
\| \| \| \| \| \| \|	This also fixes one leak around the super block when failing to mount the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Bring back mount -o ssd optimizations	Chris Mason	2008-09-25	1	-0/+2
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Add support for multiple devices per filesystem	Chris Mason	2008-09-25	1	-3/+3
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: checksum file data at bio submission time instead of during writepage	Chris Mason	2008-09-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we checkum file data during writepage, the checksumming is done one page at a time, making it difficult to do bulk metadata modifications to insert checksums for large ranges of the file at once. This patch changes btrfs to checksum on a per-bio basis instead. The bios are checksummed before they are handed off to the block layer, so each bio is contiguous and only has pages from the same inode. Checksumming on a bio basis allows us to insert and modify the file checksum items in large groups. It also allows the checksumming to be done more easily by async worker threads. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Allocator improvements	Chris Mason	2008-09-25	1	-4/+34
\| \| \| \| \| \| \| \| \|	Reduce CPU time searching for free blocks by optimizing find_first_extent_bit Fix find_free_extent to make better use of the last_alloc hint. Before it was often finding blocks just before the hint. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Fix "no csum found for inode" issue.	Yan	2008-09-25	1	-2/+3
\| \| \| \| \| \| \|	A few codes were not properly updated for changes of extent map. This may be the causes of "no csum found for inode" issue. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Create larger bios for btree blocks	Chris Mason	2008-09-25	1	-3/+9
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Don't case unsigned long to int in bio submission	Chris Mason	2008-09-25	1	-1/+1
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Fix typo in extent_io.c	Yan	2008-09-25	1	-2/+2
\| \| \| \| \| \|	--- Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Fix delalloc account on state deletion	Chris Mason	2008-09-25	1	-0/+1
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Add a lookup cache to the extent state tree	Chris Mason	2008-09-25	1	-17/+40
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Enable delalloc accounting	Chris Mason	2008-09-25	1	-7/+7
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: mount -o max_inline=size to control the maximum inline extent size	Chris Mason	2008-09-25	1	-1/+0
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Do delalloc accounting via hooks in the extent_state code	Chris Mason	2008-09-25	1	-0/+25
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: During deletes and truncate, remove many items at once from the tree	Chris Mason	2008-09-25	1	-1/+0
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: extent_io and extent_state optimizations	Chris Mason	2008-09-25	1	-95/+263
\| \| \| \| \| \| \| \| \| \| \| \|	The end_bio routines are changed to take a pointer to the extent state struct, and the state tree is walked in order to set/clear appropriate bits as IO completes. This greatly reduces the number of rbtree searches done by the end_bio handlers, and reduces lock contention. The extent_io releasepage function is changed to avoid expensive searches for locked state. Signed-off-by: Chris Mason <chris.mason@oracle.com>
*	Btrfs: Split the extent_map code into two parts	Chris Mason	2008-09-25	1	-0/+3089
	There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>