op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	f2fs: report error of fill_zero	Chao Yu	2015-08-10	1	-18/+38
\| \| \| \| \| \| \| \| \| \| \| \|	fill_zero can fail due to a lot of reason, but previously we do not handle its return value, so its callers such as punch_hole/f2fs_zero_range may report success, but actually can fail because of error occurs inside fill_zero. This patch fixes to report correct return value of fill_zero. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: recover invalid/reserved block address for fsynced file	Chao Yu	2015-08-05	1	-4/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When testing with generic/101 in xfstests, error message outputed as below: --- tests/generic/101.out +++ results//generic/101.out.bad @@ -10,10 +10,14 @@ File foo content after log replay: 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa * -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb * 0372000 ... (Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff) The test flow is like below: 1. pwrite foo -S 0xaa 0 64K 2. pwrite foo -S 0xbb 64K 61K 3. sync 4. truncate foo 64K 5. truncate foo 125K 6. fsync foo 7. flakey drop writes 8. umount After this test, we expect the data of recovered file will have the first 64k of data filling with value 0xaa and the next 61k of data filling with value 0x00 because we have fsynced it before dropping writes in dm. In f2fs, during recovering, we will only recover the valid block address in direct node page if it is marked as a fsynced dnode, but block address which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be recovered. So, the file recovered shows its incorrect data 0xbb in range of [61k, 125k]. In this patch, we fix to recover invalid/reserved block during recover flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use extent cache to optimize f2fs_reserve_block	Fan Li	2015-08-05	2	-1/+16
\| \| \| \| \| \| \| \| \| \| \|	In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: invalidate temporary meta page	Chao Yu	2015-08-05	4	-6/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To avoid meeting garbage data in next free node block at the end of warm node chain when doing recovery, we will try to zero out that invalid block. If the device is not support discard, our way for zeroing out block is: grabbing a temporary zeroed page in meta inode, then, issue write request with this page. But, we forget to release that temporary page, so our memory usage will increase without gaining any hit ratio benefit, so it's better to free it for saving memory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to release inode page correctly	Chao Yu	2015-08-05	2	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: unify f2fs_bug_on when check blocks and segment	Liu Xue	2015-08-05	1	-37/+8
\| \| \| \| \| \| \| \|	Replace BUG_ON with f2fs_bug_on to deal with block and segment validity check failed. Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: freeze filesystem when fail to update meta page due to IO error	Chao Yu	2015-08-05	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In get_meta_page, we guarantee no failure for the returned page, but sometimes, IO error from device will incur returning an non-updated page. Then, we still use this page as updated one, exception could happen when using this kind of page. So in this condition, we'd better freeze fs by making fs readonly and and stop doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: change the timing of f2fs_wait_on_page_writeback	Fan Li	2015-08-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	some backing devices need pages to be stable during writeback. It doesn't matter if the page is completely overwritten or already uptodate, it needs to wait before write. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: handle error cases in commit_inmem_pages	Jaegeuk Kim	2015-08-05	3	-5/+14
\| \| \| \| \| \| \| \|	This patch adds to handle error cases in commit_inmem_pages. If an error occurs, it stops to write the pages and return the error right away. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to build free nids from readaheaded nat pages	Chao Yu	2015-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there is no enough free nids in free nid cache, we will try to readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode, then, reading nat entries in nat page for adding free nids to free nid cache. But when traversing all nat pages we readaheaded in a circulation, our exit condition is not set right, one more nat page will be scanned without readaheading, resulting worse read performance. This patch fixes to read the correct number nat pages to avoid bad performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix inline data/dentry stat number leak	Chao Yu	2015-08-05	1	-2/+0
\| \| \| \| \| \| \| \| \|	If we clear inline data/dentry flag in handle_failed_inode, we will fail to decline the stat count of inline data/dentry in f2fs_evict_inode due to no flag in inode. So remove the wrong clearing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: convert inline data before set atomic/volatile flag	Chao Yu	2015-08-05	1	-4/+12
\| \| \| \| \| \| \| \| \|	In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting inline data, we will report error to user, but still remain atomic/volatile flag in inode, it will impact further writes for this file. Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to wait all atomic written pages writeback	Chao Yu	2015-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the incorrect range (0, LONG_MAX) which is used in ranged fsync. If we use LONG_MAX as the parameter for indicating the end of file we want to synchronize, in 32-bits architecture machine, these datas after 4GB offset may not be persisted in storage after ->fsync returned. Here, we alter LONG_MAX to LLONG_MAX to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: skip writing in ->writepages when no dirty pages exist	Chao Yu	2015-08-05	1	-0/+4
\| \| \| \| \| \| \| \| \|	When flushing comes from background, if there is no dirty page in the mapping of inode, we'd better to skip seeking dirty page from mapping for writebacking. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: optimize f2fs_write_cache_pages	Tiezhu Yang	2015-08-05	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The if statement "goto continue_unlock" is exactly the same when each if condition is true that is depended on the value of both "step" and "is_cold_data(page)" are 0 or 1. That means when the value of "step" equals to "is_cold_data(page)", the if condition is true and the if statement "goto continue_unlock" appears only once, so it can be optimized to reduce the duplicated code. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix double lock in handle_failed_inode	Chao Yu	2015-08-05	4	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In handle_failed_inode, there is a potential deadlock which can happen in below call path: - f2fs_create - f2fs_lock_op down_read(cp_rwsem) - f2fs_add_link - __f2fs_add_link - init_inode_metadata - f2fs_init_security failed - truncate_blocks failed - handle_failed_inode - f2fs_truncate - truncate_blocks(..,true) - write_checkpoint - block_operations - f2fs_lock_all down_write(cp_rwsem) - f2fs_lock_op down_read(cp_rwsem) So in this path, we pass parameter to f2fs_truncate to make sure cp_rwsem in truncate_blocks will not be locked again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: reduce region of cp_rwsem covered in f2fs_do_collapse	Chao Yu	2015-08-05	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	In f2fs_do_collapse, region cp_rwsem covered is large, since it will be held until all blocks are left shifted, so if we try to collapse small area at the beginning of large file, checkpoint who want to grab writer's lock of cp_rwsem will be delayed for long time. In order to avoid this condition, altering to lock/unlock cp_rwsem each shift operation. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: add new interfaces for extent tree	Fan Li	2015-08-05	1	-7/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a lookup and a insertion interface for extent tree. The new lookup return the insert position and the prev/next extents closest to the offset we lookup when find no match. The new insertion uses above parameters to improve performance. There are three possible insertions after the lookup in f2fs_update_extent_tree, two of them insert parts of removed extent back to tree, since no merge happens during this process, new insertion skips the merge check in this scanario; the another insertion inserts a new extent to tree, new insertion uses prev/next extent and insert position to insert this extent directly, and save the time of searching down the tree. As long as tree remains unchanged between lookup and insertion, this would work fine. And the new lookup would be useful when add multi-blocks extent support for insertion interface. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: callers take care of the page from bio error	Jaegeuk Kim	2015-08-05	3	-26/+26
\| \| \| \| \| \| \|	This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use atomic_t to record hit ratio info of extent cache	Chao Yu	2015-08-05	3	-8/+12
\| \| \| \| \| \| \| \| \|	Variables for recording extent cache ratio info were updated without protection, this patch tries to alter them to atomic_t type for more accurate stat. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: stat inline xattr inode number	Chao Yu	2015-08-05	4	-1/+22
\| \| \| \| \| \| \| \|	This patch adds to stat the number of inline xattr inode for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use a page temporarily for encrypted gced page	Jaegeuk Kim	2015-08-05	1	-1/+4
\| \| \| \| \| \| \|	That encrypted page is used temporarily, so we don't need to mark it accessed. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: expose f2fs_write_cache_pages	Chao Yu	2015-08-04	1	-1/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there are gced dirty pages and normal dirty pages in the mapping of one inode, we might writeback them alternately with discontinuous block address, resulting in low performance. This patch introduces f2fs_write_cache_pages with codes copied from write_cache_pages in mm/page-writeback.c. In this function, we refactor flow with two steps: 1) writeback all cold type pages. 2) writeback all non-cold type pages. By using this method, f2fs will writeback dirty pages with the same temperature in bunch mode, it makes writeouted block being with more continuous address, so they can be merged as much as possible in f2fs bio cache, and also it will reduce the chance of submiting small IO from block layer. Test environment: 8g nokia sd card (very old sd card, but it shows better effect when testing with this patch, and with a 32g kingston sd card, I didn't see much more improvement). Test step: 1. touch testfile; 2. truncate -s 512K testfile; 3. write all pages with odd index; 4. trigger gc by ioctl; 5. write all pages with even index; 6. time fsync testfile. before: real 0m0.402s user 0m0.000s sys 0m0.000s after: real 0m0.143s user 0m0.004s sys 0m0.004s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: correct return value of ->setxattr	Chao Yu	2015-08-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes to return correct error number of ->setxattr, which is reported by xfstest tests/generic/026 as below: generic/026 - output mismatch --- tests/generic/026.out +++ results/generic/026.out.bad @@ -4,6 +4,6 @@ 1 below acl max acl max 1 above acl max -chacl: cannot set access acl on "largeaclfile": Argument list too long +chacl: cannot set access acl on "largeaclfile": Numerical result out of range use 16 aces use 17 aces ... Ran: generic/026 Failures: generic/026 Failed 1 of 1 tests Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: cleanup write_orphan_inodes	Chao Yu	2015-08-04	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, since 'commit 4531929e3922 ("f2fs: move grabing orphan pages out of protection region")' was committed, in write_orphan_inodes(), we will grab all meta page in a batch before we use them under spinlock, so that we can avoid large time delay of grabbing meta pages under spinlock. Now, 'commit d6c67a4fee86 ("f2fs: revmove spin_lock for write_orphan_inodes")' remove the spinlock in write_orphan_inodes, so there is no issue we describe above, we'd better recover to move the grab operation to original place for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: warm up cold page after mmaped write	Chao Yu	2015-08-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With cost-benifit method, background gc will consider old section with fewer valid blocks as candidate victim, these old blocks in section will be treated as cold data, and laterly will be moved into cold segment. But if the gcing page is attached by user through buffered or mmaped write, we should reset the page as non-cold one, because this page may have more opportunity for further updating. So fix to add clearing code for the missed 'mmap' case. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT	Chao Yu	2015-08-04	3	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When background gc is off, the only way to trigger gc is executing a force gc in some operations who wants to grab space in disk. The executing condition is limited: to execute force gc, we should wait for the time when there is almost no more free section for LFS allocation. This seems not reasonable for our user who wants to control triggering gc by himself. This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for triggering garbage collection by using ioctl. It provides our users one more option to trigger gc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: maintain extent cache in separated file	Chao Yu	2015-08-04	4	-586/+610
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves extent cache related code from data.c into extent_cache.c since extent cache is independent feature, and its codes are not relate to others in data.c, it's better for us to maintain them in separated place. There is no functionality change, but several small coding style fixes including: * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting; * rename misspelled word 'untill' to 'until'; * remove unneeded 'return' in the end of f2fs_destroy_extent_tree(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN	Fan Li	2015-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will be kept in extent cache after split, extents already shorter than F2FS_MIN_EXTENT_LEN don't need to try split at all. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to update page flag	Chao Yu	2015-08-04	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes to update page flag (e.g. Uptodate/cold flag) in ->write_begin. Otherwise, page will be non-uptodate when we try to write entire page, and cold data flag in page will not be clean when gced page is being rewritten. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: shrink unreferenced extent_caches first	Jaegeuk Kim	2015-08-04	1	-10/+41
\| \| \| \| \| \| \| \|	If an extent_tree entry has a zero reference count, we can drop it from the cache in higher priority rather than currently referencing entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: enhance multithread performance	Chao Yu	2015-08-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ->writepages, we use writepages mutex lock to serialize all block address allocation and page submitting pairs from different inodes. This method makes our delayed dirty pages of one inode being written continously as many as possible. But there is one problem that we did not submit current cached bio in protection region of writepages mutex lock, so there is a small chance that we submit the one of other thread's as below, resulting in splitting more bios. thread 1 thread 2 ->writepages lock(writepages) ->write_cache_pages unlock(writepages) lock(writepages) ->write_cache_pages ->f2fs_submit_merged_bio ->writepage unlock(writepages) fs_mark-6535 [002] .... 2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288 fs_mark-6536 [000] .... 2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096 fs_mark-6536 [000] .... 2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096 fs_mark-6535 [002] .... 2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096 This may really increase time of block layer works, and may cause larger IO lantency. This patch moves the submitting operation into region of writepages mutex lock to avoid bio splits when concurrently writebacking is intensive. my test environment: virtual machine, intel cpu i5 2500, 8GB size memory, 4GB size ramdisk time fs_mark -t 16 -L 1 -s 524288 -S 1 -d /mnt/f2fs/ before: real 0m4.244s user 0m0.088s sys 0m12.336s after: real 0m3.822s user 0m0.072s sys 0m10.760s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: restrict multimedia filename	Chao Yu	2015-08-04	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When testing with fs_mark, some blocks were written out as cold data which were mixed with warm data, resulting in splitting more bios. This is because fs_mark will create file with random filename as below: 559551ee~~~~~~~~15Z29OCC05JCKQP60JQ42MKV 559551ee~~~~~~~~NZAZ6X8OA8LHIIP6XD0L58RM 559551ef~~~~~~~~B15YDSWAK789HPSDZKYTW6WM 559551f1~~~~~~~~2DAE5DPS79785BUNTFWBEMP3 559551f1~~~~~~~~1MYDY0BKSQCJPI32Q8C514RM 559551f1~~~~~~~~YQOTMAOMN5CVRFOUNI026MP4 559551f3~~~~~~~~1WF42LPRTQJNPPGR3EINKMPE 559551f3~~~~~~~~8Y2NRK7CEPPAA02LY936PJPG They are regarded as cold file since their filename are ended with multimedia files' extension, but this should be wrong as we only match the extension of filename, not the whole one. In this patch, we try to fix the format of multimedia filename to: "filename + '.' + extension", then we set cold file only its filename matches the format. So after this change, it will reduce the probability we set the wrong cold file, also it helps a little for fs_mark's performance on f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: make the function check_dnode have a return type of bool and change ↵	Nicholas Krause	2015-08-04	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	it's name to is_alive This makes the function check_dnode have a return type of bool due to this particular function only ever returning either one or zero as its return value and changes the name of the function to is_alive in order to better explain this function's intended work of checking if a dnode is still in use by the filesystem. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> [Jaegeuk Kim: change the return value check for the renamed function] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: check the largest extent at look-up time	Jaegeuk Kim	2015-08-04	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \|	Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee that the largest extent would be cached in the tree all the time. Instead of relying on extent_tree, we can simply check the cached one in extent tree accordingly. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use extent_cache by default	Jaegeuk Kim	2015-08-04	6	-265/+142
\| \| \| \| \| \| \| \| \| \| \| \|	We don't need to handle the duplicate extent information. The integrated rule is: - update on-disk extent with largest one tracked by in-memory extent_cache - destroy extent_tree for the truncation case - drop per-inode extent_cache by shrinker Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: add noextent_cache mount option	Jaegeuk Kim	2015-08-04	1	-0/+7
\| \| \| \| \| \| \|	This patch adds noextent_cache mount option. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: shrink extent_cache entries	Jaegeuk Kim	2015-08-04	4	-11/+27
\| \| \| \| \| \| \|	This patch registers shrinking extent_caches. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: shrink nat_cache entries	Jaegeuk Kim	2015-08-04	3	-7/+18
\| \| \| \| \| \| \|	This patch registers shrinking nat_cache entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce a shrinker for mounted fs	Jaegeuk Kim	2015-08-04	4	-1/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a shrinker targeting to reduce memory footprint consumed by a number of in-memory f2fs data structures. In addition, it newly adds: - sbi->umount_mutex to avoid data races on shrinker and put_super - sbi->shruinker_run_no to not revisit objects Note that the basic implementation was copied from fs/ubifs/shrinker.c Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: set cached_en after checking finally	Jaegeuk Kim	2015-08-04	1	-5/+4
\| \| \| \| \| \| \| \|	This patch relocates cached_en not only to be covered by spin_lock, but also to set once after checking out completely. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: update on-disk extents even under extent_cache	Jaegeuk Kim	2015-08-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the time, and then finally preserves its up-to-date extent into on-disk one during f2fs_evict_inode. But, in the following scenario: 1. mount 2. open & write an extent X 3. f2fs_evict_inode; on-disk extent is X 4. open & update the extent X with Y 5. sync; trigger checkpoint 6. power-cut after power-on, f2fs should serve extent Y, but we have an on-disk extent X. This causes a failure on xfstests/311. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix wrong block address calculation for a split extent	Jaegeuk Kim	2015-08-04	1	-1/+1
\| \| \| \| \| \| \| \|	This patch fixes wrong calculation on block address field when an extent is split. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: convert inline_data for various fallocate	Jaegeuk Kim	2015-08-04	1	-0/+14
\| \| \| \| \| \| \| \|	For newly added fallocate types, it should convert inline_data before handling block swapping. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: avoid to use failed inode immediately	Jaegeuk Kim	2015-08-04	3	-9/+15
\| \| \| \| \| \| \| \| \|	Before iput is called, the inode number used by a bad inode can be reassigned to other new inode, resulting in any abnormal behaviors on the new inode. This should not happen for the new inode. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: avoid freed stat information	Jaegeuk Kim	2015-08-04	1	-1/+3
\| \| \| \| \| \| \| \|	The write_checkpoint can update stat information, so we should destroy the stat structure after it. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to record dirty page count for symlink	Chao Yu	2015-08-04	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \|	Dirty page can be exist in mapping of newly created symlink, but previously we did not maintain the counting of dirty page for symlink like we maintained for regular/directory, so the counting we lookuped should be wrong. This patch adds missed dirty page counting for symlink to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs crypto: delete an unnecessary check before the function call "key_put"	Markus Elfring	2015-08-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	The key_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-08-03	3	-19/+6
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two critical regression fixes for CephFS from Zheng, and an RBD completion fix for layered images from Ilya" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix copyup completion race ceph: always re-send cap flushes when MDS recovers ceph: fix ceph_encode_locks_to_buffer()
\| *	ceph: always re-send cap flushes when MDS recovers	Yan, Zheng	2015-07-31	2	-18/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit e548e9b93d3e565e42b938a99804114565be1f81 makes the kclient only re-send cap flush once during MDS failover. If the kclient sends a cap flush after MDS enters reconnect stage but before MDS recovers. The kclient will skip re-sending the same cap flush when MDS recovers. This causes problem for newly created inode. The MDS handles cap flushes before replaying unsafe requests, so it's possible that MDS find corresponding inode is missing when handling cap flush. The fix is reverting to old behaviour: always re-send when MDS recovers Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>