op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Btrfs: merge pending IO for tree log write back	Miao Xie	2013-06-14	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before applying this patch, we flushed the log tree of the fs/file tree firstly, and then flushed the log root tree. It is ineffective, especially on the hard disk. This patch improved this problem by wrapping the above two flushes by the same blk_plug. By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s) on my scsi disk whose disk buffer was enabled. Test step: # mkfs.btrfs -f -m single <disk> # mount <disk> <mnt> # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: kill replicate code in replay_one_buffer	Liu Bo	2013-06-14	1	-7/+2
\| \| \| \| \| \| \|	EXTREF is treated same as REF, so we can make the code tidy. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup the similar code of the fs root read	Miao Xie	2013-06-14	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several functions whose code is similar, such as btrfs_find_last_root() btrfs_read_fs_root_no_radix() Besides that, some functions are invoked twice, it is unnecessary, for example, we are sure that all roots which is found in btrfs_find_orphan_roots() have their orphan items, so it is unnecessary to check the orphan item again. So cleanup it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: make static code static & remove dead code	Eric Sandeen	2013-05-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove almost all of the BUG()'s from tree-log.c	Josef Bacik	2013-05-06	1	-53/+98
\| \| \| \| \| \| \| \| \|	There were a whole bunch and I was doing it for other things. I haven't tested these error paths but at the very least this is better than panicing. I've only left 2 BUG_ON()'s since they are logic errors and I want to replace them with a ASSERT framework that we can compile out for production users. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: deal with free space cache errors while replaying log	Josef Bacik	2013-05-06	1	-20/+43
\| \| \| \| \| \| \| \| \| \|	So everybody who got hit by my fsync bug will still continue to hit this BUG_ON() in the free space cache, which is pretty heavy handed. So I took a file system that had this bug and fixed up all the BUG_ON()'s and leaks that popped up when I tried to mount a broken file system like this. With this patch we just fail to mount instead of panicing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: check return value of commit when recovering log	Josef Bacik	2013-05-06	1	-5/+6
\| \| \| \| \| \| \| \| \|	We need to check the return value of the commit in case something goes wrong, otherwise we could end up going down the line and doing more stuff (like orphan cleanup) before we notice we should have errored out. We need to do this before we free up the log_tree_root since the caller will handle all of that. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't try and free ebs twice in log replay	Josef Bacik	2013-05-06	1	-7/+0
\| \| \| \| \| \| \|	This work is done by btrfs_free_path() anyway so there's no need for this duplicate work. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused argument of btrfs_extend_item()	Tsutomu Itoh	2013-05-06	1	-1/+1
\| \| \| \| \| \| \|	Argument 'trans' is not used in btrfs_extend_item(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cleanup of function where fixup_low_keys() is called	Tsutomu Itoh	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	If argument 'trans' is unnecessary in the function where fixup_low_keys() is called, 'trans' is deleted. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix bad extent logging	Josef Bacik	2013-05-06	1	-138/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A user sent me a btrfs-image of a file system that was panicing on mount during the log recovery. I had originally thought these problems were from a bug in the free space cache code, but that was just a symptom of the problem. The problem is if your application does something like this [prealloc][prealloc][prealloc] the internal extent maps will merge those all together into one extent map, even though on disk they are 3 separate extents. So if you go to write into one of these ranges the extent map will be right since we use the physical extent when doing the write, but when we log the extents they will use the wrong sizes for the remainder prealloc space. If this doesn't happen to trip up the free space cache (which it won't in a lot of cases) then you will get bogus entries in your extent tree which will screw stuff up later. The data and such will still work, but everything else is broken. This patch fixes this by not allowing extents that are on the modified list to be merged. This has the side effect that we are no longer adding everything to the modified list all the time, which means we now have to call btrfs_drop_extents every time we log an extent into the tree. So this allows me to drop all this speciality code I was using to get around calling btrfs_drop_extents. With this patch the testcase I've created no longer creates a bogus file system after replaying the log. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: log ram bytes properly	Josef Bacik	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	When logging changed extents I was logging ram_bytes as the current length, which isn't correct, it's supposed to be the ram bytes of the original extent. This is for compression where even if we split the extent we need to know the ram bytes so when we uncompress the extent we know how big it will be. This was still working out right with compression for some reason but I think we were getting lucky. It was definitely off for prealloc which is why I noticed it, btrfsck was complaining about it. With this patch btrfsck no longer complains after a log replay. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: Cleanup some redundant codes in btrfs_log_inode()	Zhi Yong Wu	2013-05-06	1	-2/+0
\| \| \| \| \|	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: make sure nbytes are right after log replay	Josef Bacik	2013-04-13	1	-6/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While trying to track down a tree log replay bug I noticed that fsck was always complaining about nbytes not being right for our fsynced file. That is because the new fsync stuff doesn't wait for ordered extents to complete, so the inodes nbytes are not necessarily updated properly when we log it. So to fix this we need to set nbytes to whatever it is on the inode that is on disk, so when we replay the extents we can just add the bytes that are being added as we replay the extent. This makes it work for the case that we have the wrong nbytes or the case that we logged everything and nbytes is actually correct. With this I'm no longer getting nbytes errors out of btrfsck. Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: use set_nlink if our i_nlink is 0	Josef Bacik	2013-03-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	We need to inc the nlink of deleted entries when running replay so we can do the unlink on the fs_root and get everything cleaned up and then have the orphan cleanup do the right thing. The problem is inc_nlink complains about this, even thought it still does the right thing. So use set_nlink() if our i_nlink is 0 to keep users from seeing the warnings during log replay. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: delete inline extents when we find them during logging	Josef Bacik	2013-03-01	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently when we do inline extents we allow the data to overlap the last chunk of the btrfs_file_extent_item, which means that we can possibly have a btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item. This messes with us when we try to overwrite the extent when logging new extents since we expect for it to be the right size. To fix this just delete the item and try to do the insert again which will give us the proper sized btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer would blow up because we're trying to write past the end of the leaf. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix memory leak of log roots	Liu Bo	2013-02-28	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	When we abort a transaction while fsyncing, we'll skip freeing log roots part of committing a transaction, which leads to memory leak. This adds a 'free log roots' in putting super when no more users hold references on log roots, so it's safe and clean. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: cleanup for open-coded alignment	Qu Wenruo	2013-02-26	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Though most of the btrfs codes are using ALIGN macro for page alignment, there are still some codes using open-coded alignment like the following: ------ u64 mask = ((u64)root->stripesize - 1); u64 ret = (val + mask) & ~mask; ------ Or even hidden one: ------ num_bytes = (end - start + blocksize) & ~(blocksize - 1); ------ Sometimes these open-coded alignment is not so easy to understand for newbie like me. This commit changes the open-coded alignment to the ALIGN macro for a better readability. Also there is a previous patch from David Sterba with similar changes, but the patch is for 3.2 kernel and seems not merged. http://www.spinics.net/lists/linux-btrfs/msg12747.html Cc: David Sterba <dave@jikos.cz> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: remove cache only arguments from defrag path	Eric Sandeen	2013-02-20	1	-2/+2
\| \| \| \| \| \| \| \| \|	The entry point at the defrag ioctl always sets "cache only" to 0; the codepaths haven't run for a long time as far as I can tell. Chris says they're dead code, so remove them. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: kill unused argument of btrfs_pin_extent_for_log_replay	Liu Bo	2013-02-20	1	-2/+1
\| \| \| \| \| \| \|	Argument 'trans' is not used any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: wait on ordered extents at the last possible moment	Josef Bacik	2013-02-20	1	-4/+128
\| \| \| \| \| \| \| \| \| \| \|	Since we don't actually copy the extent information from the source tree in the fast case we don't need to wait for ordered io to be completed in order to fsync, we just need to wait for the io to be completed. So when we're logging our file just attach all of the ordered extents to the log, and then when the log syncs just wait for IO_DONE on the ordered extents and then write the super. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: use right range to find checksum for compressed extents	Liu Bo	2013-01-24	1	-0/+5
\| \| \| \| \| \| \| \| \|	For compressed extents, the range of checksum is covered by disk length, and the disk length is different with ram length, so we need to use disk length instead to get us the right checksum. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: do not allow logged extents to be merged or removed	Josef Bacik	2013-01-24	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	We drop the extent map tree lock while we're logging extents, so somebody could come in and merge another extent into this one and screw up our logging, or they could even remove us from the list which would keep us from logging the extent or freeing our ref on it, so we need to make sure to not clear LOGGING until after the extent is logged, and then we can merge it to adjacent extents. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: use tokens where we can in the tree log	Josef Bacik	2012-12-16	1	-54/+73
\| \| \| \| \| \| \| \| \| \| \|	If we are syncing over and over the overhead of doing all those maps in fill_inode_item and log_changed_extents really starts to hurt, so use map tokens so we can avoid all the extra mapping. Since the token maps from our offset to the end of the page make sure to set the first thing in the item first so we really only do one map. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: log changed inodes based on the extent map tree	Josef Bacik	2012-12-16	1	-144/+194
\| \| \| \| \| \| \| \| \| \|	We don't really need to copy extents from the source tree since we have all of the information already available to us in the extent_map tree. So instead just write the extents straight to the log tree and don't bother to copy the extent items from the source tree. Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: don't bother copying if we're only logging the inode	Josef Bacik	2012-12-16	1	-6/+34
\| \| \| \| \| \| \| \| \| \|	We don't copy inode items anwyay, we just copy them straight into the log from the in memory inode. So if we know we're only logging the inode, don't bother dropping anything, just try to insert it and either if it succeeds or we get EEXIST we can update the inode item in the log and carry on. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: only log the inode item if we can get away with it	Josef Bacik	2012-12-16	1	-2/+8
\| \| \| \| \| \| \| \| \| \|	Currently we copy all the file information into the log, inode item, the refs, xattrs etc. Except most of this doesn't change from fsync to fsync, just the inode item changes. So set a flag if an xattr changes or a link is added, and otherwise only log the inode item. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set	Miao Xie	2012-12-12	1	-1/+4
\| \| \| \| \| \| \| \| \|	If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent, but now we forget to take it into account, and set a wrong max key, if so, we will skip the file extent metadata when doing logging. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: fix unprotected extent map operation when logging file extents	Miao Xie	2012-12-12	1	-0/+2
\| \| \| \| \| \| \| \|	We forget to protect the modified_extents list, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: fix wrong file extent length	Miao Xie	2012-12-12	1	-8/+2
\| \| \| \| \| \| \| \| \|	There are two types of the file extent - inline extent and regular extent, When we log file extents, we didn't take inline extent into account, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: do not log extents when we only log new names	Liu Bo	2012-12-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When we log new names, we need to log just enough to recreate the inode during log replay, and there is no need to log extents along with it. This actually fixes a bug revealed by xfstests 241, where it shows that we're logging some extents that have not updated metadata, so we don't get proper EXTENT_DATA items to be copied to log tree. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	btrfs: Fix compilation with user namespace support enabled	Eric W. Biederman	2012-10-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compiling with user namespace support btrfs fails like: fs/btrfs/tree-log.c: In function ‘fill_inode_item’: fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’ fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’ fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’ fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’ Fix this by using i_uid_read and i_gid_read in Cc: Chris Mason <chris.mason@fusionio.com> Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
*	btrfs: init ref_index to zero in add_inode_ref	Chris Mason	2012-10-09	1	-1/+1
\| \| \| \|	Signed-off-by: Chris Mason <chris.mason@fusionio.com>
*	Btrfs: make filesystem read-only when submitting barrier fails	Stefan Behrens	2012-10-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far the return code of barrier_all_devices() is ignored, which means that errors are ignored. The result can be a corrupt filesystem which is not consistent. This commit adds code to evaluate the return code of barrier_all_devices(). The normal btrfs_error() mechanism is used to switch the filesystem into read-only mode when errors are detected. In order to decide whether barrier_all_devices() should return error or success, the number of disks that are allowed to fail the barrier submission is calculated. This calculation accounts for the worst RAID level of metadata, system and data. If single, dup or RAID0 is in use, a single disk error is already considered to be fatal. Otherwise a single disk error is tolerated. The calculation of the number of disks that are tolerated to fail the barrier operation is performed when the filesystem gets mounted, when a balance operation is started and finished, and when devices are added or removed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
*	Btrfs: don't bother committing delayed inode updates when fsyncing	Josef Bacik	2012-10-09	1	-19/+65
\| \| \| \| \| \| \|	We can just copy the in memory inode into the tree log directly, no sense in updating the fs tree so we can copy it into the tree log tree. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: be smarter about dropping things from the tree log	Josef Bacik	2012-10-09	1	-2/+13
\| \| \| \| \| \| \| \| \|	When we truncate existing items in the tree log we've been searching for each individual item and removing them. This is unnecessary churn and searching, just keep track of the slot we are on and how many items we need to delete and delete them all at once. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: don't lookup csums for prealloc extents	Josef Bacik	2012-10-09	1	-2/+1
\| \| \| \| \| \| \|	The tree logging stuff was looking up csums to copy over for prealloc extents which is just work we don't need to be doing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: cache extent state when writing out dirty metadata pages	Josef Bacik	2012-10-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	btrfs: extended inode refs	Mark Fasheh	2012-10-09	1	-52/+293
\| \| \| \| \| \| \| \| \| \| \|	This patch adds basic support for extended inode refs. This includes support for link and unlink of the refs, which basically gets us support for rename as well. Inode creation does not need changing - extended refs are only added after the ref array is full. Signed-off-by: Mark Fasheh <mfasheh@suse.de>
*	btrfs: improved readablity for add_inode_ref	Jan Schmidt	2012-10-08	1	-81/+97
\| \| \| \| \| \| \| \|	Moved part of the code into a sub function and replaced most of the gotos by ifs, hoping that it will be easier to read now. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
*	Btrfs: handle not finding the extent exactly when logging changed extents	Josef Bacik	2012-10-08	1	-6/+40
\| \| \| \| \| \| \| \| \| \| \| \|	I started hitting warnings when running xfstest 68 in a loop because there were EM's that were not lined up properly with the physical extents. This is ok, if we do something like punch a hole or write to a preallocated space or something like that we can have an EM that doesn't cover the entire physical extent. So fix the tree logging stuff to cope with this case so we don't just commit the transaction. With this patch I no longer see the warnings from the tree logging code. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: do not hold the write_lock on the extent tree while logging	Josef Bacik	2012-10-04	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dave Sterba pointed out a sleeping while atomic bug while doing fsync. This is because I'm an idiot and didn't realize that rwlock's were spin locks, so we've been holding this thing while doing allocations and such which is not good. This patch fixes this by dropping the write lock before we do anything heavy and re-acquire it when it is done. We also need to take a ref on the em's in case their corresponding pages are evicted and mark them as being logged so that releasepage does not remove them and doesn't remove them from our local list. Thanks, Reported-by: Dave Sterba <dave@jikos.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: fix unprotected ->log_batch	Miao Xie	2012-10-01	1	-7/+5
\| \| \| \| \| \| \| \| \|	We forget to protect ->log_batch when syncing a file, this patch fix this problem by atomic operation. And ->log_batch is used to check if there are parallel sync operations or not, so it is unnecessary to reset it to 0 after the sync operation of the current log tree complete. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
*	Btrfs: add hole punching	Josef Bacik	2012-10-01	1	-1/+1
\| \| \| \| \| \|	This patch adds hole punching via fallocate. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: remove unused hint byte argument for btrfs_drop_extents	Josef Bacik	2012-10-01	1	-5/+2
\| \| \| \| \| \| \| \| \|	I audited all users of btrfs_drop_extents and found that nobody actually uses the hint_byte argument. I'm sure it was used for something at some point but it's not used now, and the way the pinning works the disk bytenr would never be immediately useful anyway so lets just remove it. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
*	Btrfs: check if an inode has no checksum when logging it	Liu Bo	2012-10-01	1	-11/+12
\| \| \| \| \| \| \| \| \|	This is based on Josef's "Btrfs: turbo charge fsync". If an inode is a BTRFS_INODE_NODATASUM one, we don't need to look for csum items any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
*	Btrfs: fix a bug in checking whether a inode is already in log	Liu Bo	2012-10-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based on Josef's "Btrfs: turbo charge fsync". The current btrfs checks if an inode is in log by comparing root's last_log_commit to inode's last_sub_trans[2]. But the problem is that this root->last_log_commit is shared among inodes. Say we have N inodes to be logged, after the first inode, root's last_log_commit is updated and the N-1 remained files will be skipped. This fixes the bug by keeping a local copy of root's last_log_commit inside each inode and this local copy will be maintained itself. [1]: we regard each log transaction as a subset of btrfs's transaction, i.e. sub_trans Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
*	Btrfs: improve fsync by filtering extents that we want	Liu Bo	2012-10-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based on Josef's "Btrfs: turbo charge fsync". The above Josef's patch performs very good in random sync write test, because we won't have too much extents to merge. However, it does not performs good on the test: dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync The reason is when we do sequencial sync write, we need to merge the current extent just with the previous one, so that we can get accumulated extents to log: A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ... So we'll have to flush more and more checksum into log tree, which is the bottleneck according to my tests. But we can avoid this by telling fsync the real extents that are needed to be logged. With this, I did the above dd sync write test (size=50m), w/o (orig) w/ (josef's) w/ (this) SATA 104KB/s 109KB/s 121KB/s ramdisk 1.5MB/s 1.5MB/s 10.7MB/s (613%) Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
*	Btrfs: cleanup extents after we finish logging inode	Liu Bo	2012-10-01	1	-0/+6
\| \| \| \| \| \| \| \| \|	This is based on Josef's "Btrfs: turbo charge fsync". We should cleanup those extents after we've finished logging inode, otherwise we may do redundant work on them. Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
*	Btrfs: only warn if we hit an error when doing the tree logging	Josef Bacik	2012-10-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	I hit this a couple times while working on my fsync patch (all my bugs, not normal operation), but with my new stuff we could have new errors from cases I have not encountered, so instead of BUG()'ing we should be WARN()'ing so that we are notified there is a problem but the user doesn't lose their data. We can easily commit the transaction in the case that the tree logging fails and still be fine, so let's try and be as nice to the user as possible. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>