op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	f2fs: avoid to ra unneeded blocks in recover flow	Chao Yu	2014-12-08	3	-18/+23
\| \| \| \| \| \| \| \| \| \| \|	To improve recovery speed, f2fs try to readahead many contiguous blocks in warm node segment, but for most time, abnormal power-off do not occur frequently, so when mount a normal power-off f2fs image, by contrary ra so many blocks and then invalid them will hurt the performance of mount. It's better to just ra the first next-block for normal condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce is_valid_blkaddr to cleanup codes in ra_meta_pages	Chao Yu	2014-12-08	1	-27/+26
\| \| \| \| \| \| \| \| \|	This patch does cleanup work, it introduces is_valid_blkaddr() to include verification code for blkaddr with upper and down boundary value which were in ra_meta_pages previous. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to enable readahead for SSA/CP blocks	Chao Yu	2014-12-08	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \|	1.We use zero as upper boundary value for ra SSA/CP blocks, we will skip readahead as verification failure with max number, it causes low performance. 2.Low boundary value is not accurate for SSA/CP/POR region verification, so these values need to be redefined. This patch fixes above issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use atomic for counting inode with inline_{dir,inode} flag	Chao Yu	2014-12-08	2	-8/+11
\| \| \| \| \| \| \| \|	As inline_{dir,inode} stat is increased/decreased concurrently by multi threads, so the value is not so accurate, let's use atomic type for counting accurately. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: cleanup path to need cp at fsync	Changman Lee	2014-12-08	1	-36/+43
\| \| \| \| \| \| \| \|	Added some commentaries for code readability and cleaned up if-statement clearly. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: check if inode state is dirty at fsync	Changman Lee	2014-12-08	1	-6/+19
\| \| \| \| \| \| \| \|	If inode state is dirty, go straight to write. Suggested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: count the number of inmemory pages	Jaegeuk Kim	2014-12-08	3	-1/+8
\| \| \| \| \| \|	This patch adds counting # of inmemory pages in the page cache. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: release inmemory pages when the file was closed	Jaegeuk Kim	2014-12-08	1	-0/+9
\| \| \| \| \| \|	If file is closed, let's drop inmemory pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: set page private for inmemory pages for truncation	Jaegeuk Kim	2014-12-08	1	-0/+2
\| \| \| \| \| \| \|	The inmemory pages should be handled by invalidate_page since it needs to be released int the truncation path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: count inline_xx in do_read_inode	Jaegeuk Kim	2014-12-08	1	-2/+4
\| \| \| \| \| \| \| \|	In do_read_inode, if we failed __recover_inline_status, the inode has inline flag without increasing its count. Later, f2fs_evict_inode will decrease the count, which causes -1. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: do retry operations with cond_resched	Jaegeuk Kim	2014-12-08	4	-38/+20
\| \| \| \| \| \| \| \| \| \|	This patch revists retrial paths in f2fs. The basic idea is to use cond_resched instead of retrying from the very early stage. Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: call radix_tree_preload before radix_tree_insert	Jaegeuk Kim	2014-12-05	3	-6/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch tries to fix: BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384 (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200) (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c) (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400) (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c) (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac) (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c) The reason is that f2fs calls radix_tree_insert under enabled preemption. So, before calling it, we need to call radix_tree_preload. Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or semaphore to cover the radix tree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: use rw_semaphore for nat entry lock	Jaegeuk Kim	2014-12-03	2	-27/+27
\| \| \| \| \| \| \| \| \| \| \|	Previoulsy, we used rwlock for nat_entry lock. But, now we have a lot of complex operations in set_node_addr. (e.g., allocating kernel memories, handling radix_trees, and so on) So, this patches tries to change spinlock to rw_semaphore to give CPUs to other threads. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix missing kmem_cache_free	Jaegeuk Kim	2014-12-03	1	-1/+1
\| \| \| \| \| \|	This patch fixes missing kmem_cache_free when handling errors. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: more fast lookup for gc_inode list	Changman Lee	2014-12-02	2	-19/+34
\| \| \| \| \| \| \| \| \|	If there are many inodes that have data blocks in victim segment, it takes long time to find a inode in gc_inode list. Let's use radix_tree to reduce lookup time. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: cleanup redundant macro	Changman Lee	2014-12-01	1	-3/+3
\| \| \| \| \| \| \|	We've already made fi and sbi for inode. Let's avoid duplicated work. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to return correct error number in f2fs_write_begin	Chao Yu	2014-12-01	1	-1/+3
\| \| \| \| \| \| \|	Fix the wrong error number in error path of f2fs_write_begin. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: cleanup if-statement of phase in gc_data_segment	Changman Lee	2014-11-27	1	-16/+16
\| \| \| \| \| \| \| \|	Little cleanup to distinguish each phase easily Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: modify indentation for code readability] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to recover converted inline_data	Jaegeuk Kim	2014-11-25	1	-0/+3
\| \| \| \| \| \| \| \|	If an inode has converted inline_data which was written to the disk, we should set its inode flag for further fsync so that this inline_data can be recovered from sudden power off. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: make clean the page before writing	Jaegeuk Kim	2014-11-25	1	-1/+6
\| \| \| \| \| \|	If a page is set to be written to the disk, we can make clean the page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: no more dirty_nat_entires when flushing	Changman Lee	2014-11-25	1	-4/+4
\| \| \| \| \| \| \| \|	After flushing dirty nat entries, it has to be no more dirty nat entries. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: check dirty_nat_cnt before flushing nat entries in journal	Changman Lee	2014-11-25	1	-4/+3
\| \| \| \| \| \| \| \| \|	It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in journal. And although there are rooms for dirty nat entires if dirty_nat_cnt is zero, it's also meaningless to check __has_cursum_space. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix deadlock during inline_data conversion	Jaegeuk Kim	2014-11-25	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A deadlock can be occurred: Thread 1] Thread 2] - f2fs_write_data_pages - f2fs_write_begin - lock_page(page #0) - grab_cache_page(page #X) - get_node_page(inode_page) - grab_cache_page(page #0) : to convert inline_data - f2fs_write_data_page - f2fs_write_inline_data - get_node_page(inode_page) In this case, trying to lock inode_page and page #0 causes deadlock. In order to avoid this, this patch adds a rule for this locking policy, which is that page #0 should be locked followed by inode_page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix typos for the word "destroy" in jump labels	Markus Elfring	2014-11-25	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	Two jump labels were adjusted in the implementation of the create_node_manager_caches() function because these identifiers contained typos. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix livelock calling f2fs_iget during f2fs_evict_inode	Jaegeuk Kim	2014-11-23	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In f2fs_evict_inode, commit_inmemory_pages f2fs_gc f2fs_iget iget_locked -> wait for inode free Here, if the inode is same as the one to be evicted, f2fs should wait forever. Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid this. But, the commit_inmem_pages calls f2fs_balance_fs by default, even if f2fs_evict_inode wants to free inmemory pages only. Hence, this patch adds to trigger f2fs_balance_fs only when there is something to write. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce f2fs_dentry_kunmap to clean up	Jaegeuk Kim	2014-11-23	4	-24/+18
\| \| \| \| \| \|	This patch introduces f2fs_dentry_kunmap to clean up dirty codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix wrong data structure when create slab	Changman Lee	2014-11-23	1	-1/+1
\| \| \| \| \| \| \| \|	It used nat_entry_set when create slab for sit_entry_set. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: call flush_dcache_page when the page was updated	Jaegeuk Kim	2014-11-23	1	-0/+1
\| \| \| \| \| \|	Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: write SSA pages under memory pressure	Jaegeuk Kim	2014-11-19	1	-1/+4
\| \| \| \| \| \|	Under memory pressure, we don't need to skip SSA page writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: submit bio for node blocks in the reclaim path	Jaegeuk Kim	2014-11-19	1	-0/+4
\| \| \| \| \| \| \|	If a node page is request to be written during the reclaiming path, we should submit the bio to avoid pending to recliam it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce struct inode_management to wrap inner fields	Chao Yu	2014-11-19	4	-49/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management / struct inode_management { struct radix_tree_root ino_root; / ino entry array / spinlock_t ino_lock; / for ino entry lock / struct list_head ino_list; / inode list head / unsigned long ino_num; / number of entries / }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; / manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: remove unneeded check code with option in f2fs_remount	Chao Yu	2014-11-19	1	-2/+2
\| \| \| \| \| \| \| \| \|	Because we have checked the contrary condition in case of "if" judgment, we do not need to check the condition again in case of "else" judgment. Let's remove it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: avoid unable to restart gc thread in remount	Chao Yu	2014-11-19	2	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new option is set without BG_GC, then if any error occurred in the following procedure, we can restore to start the gc thread. But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is not set in new option, so we'd better move this condition judgment out of start_gc_thread to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: put the inode page when error was occurred	Jaegeuk Kim	2014-11-18	1	-4/+6
\| \| \| \| \| \|	We should put the inode page when error was occurred. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix to call put_page at the error handling routine	Jaegeuk Kim	2014-11-18	1	-3/+3
\| \| \| \| \| \| \|	The locked page should be released before returning the function. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: convert inline_data when i_size becomes large	Jaegeuk Kim	2014-11-11	2	-0/+9
\| \| \| \| \| \| \| \| \| \|	If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode. Otherwise, we can make some dirty pages during the truncation, and those pages will be written through f2fs_write_data_page. At that moment, the inode has still inline_data, so that it tries to write non- zero pages into inline_data area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: fix deadlock to grab 0'th data page	Jaegeuk Kim	2014-11-11	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scenario is like this. One trhead triggers: f2fs_write_data_pages lock_page f2fs_write_data_page f2fs_lock_op <- wait The other thread triggers: f2fs_truncate truncate_blocks f2fs_lock_op truncate_partial_data_page lock_page <- wait for locking the page This patch resolves this bug by relocating truncate_partial_data_page. This function is just to truncate user data page and not related to FS consistency as well. And, we don't need to call truncate_inline_data. Rather than that, f2fs_write_data_page will finally update inline_data later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: reduce the number of inline_data inode before clearing it	Jaegeuk Kim	2014-11-10	1	-1/+1
\| \| \| \| \| \| \|	The # of inline_data inode is decreased only when it has inline_data. After clearing the flag, we can't decreased the number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: implement -o dirsync	Jaegeuk Kim	2014-11-10	1	-0/+24
\| \| \| \| \| \| \|	If a mount option has dirsync, we should call checkpoint for all the directory operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: do not skip any writes under memory pressure	Jaegeuk Kim	2014-11-10	1	-0/+3
\| \| \| \| \| \|	Under memory pressure, let's avoid skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: write node pages if checkpoint is not doing	Jaegeuk Kim	2014-11-10	1	-4/+6
\| \| \| \| \| \| \| \|	It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: control the memory footprint used by ino entries	Jaegeuk Kim	2014-11-06	3	-8/+26
\| \| \| \| \| \| \|	This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce the number of inode entries	Jaegeuk Kim	2014-11-06	3	-14/+19
\| \| \| \| \| \|	This patch adds to monitor the number of ino entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: disable roll-forward when active_logs = 2	Jaegeuk Kim	2014-11-05	2	-2/+4
\| \| \| \| \| \| \|	The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: introduce -o fastboot for reducing booting time only	Jaegeuk Kim	2014-11-04	4	-6/+16
\| \| \| \| \| \| \| \| \|	If a system wants to reduce the booting time as a top priority, now we can use a mount option, -o fastboot. With this option, f2fs conducts a little bit slow write_checkpoint, but it can avoid the node page reads during the next mount time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: avoid race condition in handling wait_io	Jaegeuk Kim	2014-11-04	3	-29/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: send discard commands in larger extent	Jaegeuk Kim	2014-11-04	1	-17/+27
\| \| \| \| \| \| \| \|	If there is a chance to make a huge sized discard command, we don't need to split it out, since each blkdev_issue_discard should wait one at a time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: revisit inline_data to avoid data races and potential bugs	Jaegeuk Kim	2014-11-04	6	-212/+250
\| \| \| \| \| \| \| \| \| \| \|	This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: remove pointless bit testing in f2fs_delete_entry()	Jan Kara	2014-11-03	1	-1/+1
\| \| \| \| \| \| \| \| \|	There's no point in using test_and_clear_bit_le() when we don't use the return value of the function. Just use clear_bit_le() instead. Coverity-id: 1016434 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
*	f2fs: do not discard data protected by the previous checkpoint	Jaegeuk Kim	2014-11-03	1	-1/+1
\| \| \| \| \| \| \|	We should not discard any data protected by the previous checkpoint all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>