summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* f2fs: call radix_tree_preload before radix_tree_insertJaegeuk Kim2014-12-053-6/+19
| | | | | | | | | | | | | | | | | | | | This patch tries to fix: BUG: using smp_processor_id() in preemptible [00000000] code: f2fs_gc-254:0/384 (radix_tree_node_alloc+0x14/0x74) from [<c033d8a0>] (radix_tree_insert+0x110/0x200) (radix_tree_insert+0x110/0x200) from [<c02e8264>] (gc_data_segment+0x340/0x52c) (gc_data_segment+0x340/0x52c) from [<c02e8658>] (f2fs_gc+0x208/0x400) (f2fs_gc+0x208/0x400) from [<c02e8a98>] (gc_thread_func+0x248/0x28c) (gc_thread_func+0x248/0x28c) from [<c0139944>] (kthread+0xa0/0xac) (kthread+0xa0/0xac) from [<c0105ef8>] (ret_from_fork+0x14/0x3c) The reason is that f2fs calls radix_tree_insert under enabled preemption. So, before calling it, we need to call radix_tree_preload. Otherwise, we should use _GFP_WAIT for the radix tree, and use mutex or semaphore to cover the radix tree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use rw_semaphore for nat entry lockJaegeuk Kim2014-12-032-27/+27
| | | | | | | | | | | Previoulsy, we used rwlock for nat_entry lock. But, now we have a lot of complex operations in set_node_addr. (e.g., allocating kernel memories, handling radix_trees, and so on) So, this patches tries to change spinlock to rw_semaphore to give CPUs to other threads. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix missing kmem_cache_freeJaegeuk Kim2014-12-031-1/+1
| | | | | | This patch fixes missing kmem_cache_free when handling errors. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: more fast lookup for gc_inode listChangman Lee2014-12-022-19/+34
| | | | | | | | | If there are many inodes that have data blocks in victim segment, it takes long time to find a inode in gc_inode list. Let's use radix_tree to reduce lookup time. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: cleanup redundant macroChangman Lee2014-12-011-3/+3
| | | | | | | We've already made fi and sbi for inode. Let's avoid duplicated work. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to return correct error number in f2fs_write_beginChao Yu2014-12-011-1/+3
| | | | | | | Fix the wrong error number in error path of f2fs_write_begin. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: cleanup if-statement of phase in gc_data_segmentChangman Lee2014-11-271-16/+16
| | | | | | | | Little cleanup to distinguish each phase easily Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: modify indentation for code readability] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to recover converted inline_dataJaegeuk Kim2014-11-251-0/+3
| | | | | | | | If an inode has converted inline_data which was written to the disk, we should set its inode flag for further fsync so that this inline_data can be recovered from sudden power off. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: make clean the page before writingJaegeuk Kim2014-11-251-1/+6
| | | | | | If a page is set to be written to the disk, we can make clean the page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: no more dirty_nat_entires when flushingChangman Lee2014-11-251-4/+4
| | | | | | | | After flushing dirty nat entries, it has to be no more dirty nat entries. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check dirty_nat_cnt before flushing nat entries in journalChangman Lee2014-11-251-4/+3
| | | | | | | | | It's meaningless to check dirty_nat_cnt after re-dirtying nat entries in journal. And although there are rooms for dirty nat entires if dirty_nat_cnt is zero, it's also meaningless to check __has_cursum_space. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix deadlock during inline_data conversionJaegeuk Kim2014-11-251-14/+14
| | | | | | | | | | | | | | | | | | | | A deadlock can be occurred: Thread 1] Thread 2] - f2fs_write_data_pages - f2fs_write_begin - lock_page(page #0) - grab_cache_page(page #X) - get_node_page(inode_page) - grab_cache_page(page #0) : to convert inline_data - f2fs_write_data_page - f2fs_write_inline_data - get_node_page(inode_page) In this case, trying to lock inode_page and page #0 causes deadlock. In order to avoid this, this patch adds a rule for this locking policy, which is that page #0 should be locked followed by inode_page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix typos for the word "destroy" in jump labelsMarkus Elfring2014-11-251-4/+4
| | | | | | | | | | Two jump labels were adjusted in the implementation of the create_node_manager_caches() function because these identifiers contained typos. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix livelock calling f2fs_iget during f2fs_evict_inodeJaegeuk Kim2014-11-231-1/+10
| | | | | | | | | | | | | | | | | | | | | In f2fs_evict_inode, commit_inmemory_pages f2fs_gc f2fs_iget iget_locked -> wait for inode free Here, if the inode is same as the one to be evicted, f2fs should wait forever. Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid this. But, the commit_inmem_pages calls f2fs_balance_fs by default, even if f2fs_evict_inode wants to free inmemory pages only. Hence, this patch adds to trigger f2fs_balance_fs only when there is something to write. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_dentry_kunmap to clean upJaegeuk Kim2014-11-234-24/+18
| | | | | | This patch introduces f2fs_dentry_kunmap to clean up dirty codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong data structure when create slabChangman Lee2014-11-231-1/+1
| | | | | | | | It used nat_entry_set when create slab for sit_entry_set. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: call flush_dcache_page when the page was updatedJaegeuk Kim2014-11-231-0/+1
| | | | | | Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: write SSA pages under memory pressureJaegeuk Kim2014-11-191-1/+4
| | | | | | Under memory pressure, we don't need to skip SSA page writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: submit bio for node blocks in the reclaim pathJaegeuk Kim2014-11-191-0/+4
| | | | | | | If a node page is request to be written during the reclaiming path, we should submit the bio to avoid pending to recliam it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce struct inode_management to wrap inner fieldsChao Yu2014-11-194-49/+66
| | | | | | | | | | | | | | | | | | | | | | | | | Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management */ struct inode_management { struct radix_tree_root ino_root; /* ino entry array */ spinlock_t ino_lock; /* for ino entry lock */ struct list_head ino_list; /* inode list head */ unsigned long ino_num; /* number of entries */ }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; /* manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unneeded check code with option in f2fs_remountChao Yu2014-11-191-2/+2
| | | | | | | | | Because we have checked the contrary condition in case of "if" judgment, we do not need to check the condition again in case of "else" judgment. Let's remove it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid unable to restart gc thread in remountChao Yu2014-11-192-3/+1
| | | | | | | | | | | | In f2fs_remount, we will stop gc thread and set need_restart_gc as true when new option is set without BG_GC, then if any error occurred in the following procedure, we can restore to start the gc thread. But after that, We will fail to restore gc thread in start_gc_thread as BG_GC is not set in new option, so we'd better move this condition judgment out of start_gc_thread to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: put the inode page when error was occurredJaegeuk Kim2014-11-181-4/+6
| | | | | | We should put the inode page when error was occurred. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to call put_page at the error handling routineJaegeuk Kim2014-11-181-3/+3
| | | | | | | The locked page should be released before returning the function. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: convert inline_data when i_size becomes largeJaegeuk Kim2014-11-112-0/+9
| | | | | | | | | | If i_size becomes large outside of MAX_INLINE_DATA, we shoud convert the inode. Otherwise, we can make some dirty pages during the truncation, and those pages will be written through f2fs_write_data_page. At that moment, the inode has still inline_data, so that it tries to write non- zero pages into inline_data area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix deadlock to grab 0'th data pageJaegeuk Kim2014-11-111-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The scenario is like this. One trhead triggers: f2fs_write_data_pages lock_page f2fs_write_data_page f2fs_lock_op <- wait The other thread triggers: f2fs_truncate truncate_blocks f2fs_lock_op truncate_partial_data_page lock_page <- wait for locking the page This patch resolves this bug by relocating truncate_partial_data_page. This function is just to truncate user data page and not related to FS consistency as well. And, we don't need to call truncate_inline_data. Rather than that, f2fs_write_data_page will finally update inline_data later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reduce the number of inline_data inode before clearing itJaegeuk Kim2014-11-101-1/+1
| | | | | | | The # of inline_data inode is decreased only when it has inline_data. After clearing the flag, we can't decreased the number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: implement -o dirsyncJaegeuk Kim2014-11-101-0/+24
| | | | | | | If a mount option has dirsync, we should call checkpoint for all the directory operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not skip any writes under memory pressureJaegeuk Kim2014-11-101-0/+3
| | | | | | Under memory pressure, let's avoid skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: write node pages if checkpoint is not doingJaegeuk Kim2014-11-101-4/+6
| | | | | | | | It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Reviewed-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: control the memory footprint used by ino entriesJaegeuk Kim2014-11-063-8/+26
| | | | | | | This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce the number of inode entriesJaegeuk Kim2014-11-063-14/+19
| | | | | | This patch adds to monitor the number of ino entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: disable roll-forward when active_logs = 2Jaegeuk Kim2014-11-052-2/+4
| | | | | | | The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce -o fastboot for reducing booting time onlyJaegeuk Kim2014-11-045-6/+19
| | | | | | | | | If a system wants to reduce the booting time as a top priority, now we can use a mount option, -o fastboot. With this option, f2fs conducts a little bit slow write_checkpoint, but it can avoid the node page reads during the next mount time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unnecessary macroJaegeuk Kim2014-11-041-4/+0
| | | | | | Let's remove unused macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid race condition in handling wait_ioJaegeuk Kim2014-11-043-29/+10
| | | | | | | | | | | | | | | | __submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: send discard commands in larger extentJaegeuk Kim2014-11-041-17/+27
| | | | | | | | If there is a chance to make a huge sized discard command, we don't need to split it out, since each blkdev_issue_discard should wait one at a time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: revisit inline_data to avoid data races and potential bugsJaegeuk Kim2014-11-047-212/+251
| | | | | | | | | | | This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove pointless bit testing in f2fs_delete_entry()Jan Kara2014-11-031-1/+1
| | | | | | | | | There's no point in using test_and_clear_bit_le() when we don't use the return value of the function. Just use clear_bit_le() instead. Coverity-id: 1016434 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not discard data protected by the previous checkpointJaegeuk Kim2014-11-031-1/+1
| | | | | | | We should not discard any data protected by the previous checkpoint all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: flush_dcache_page for inline dataJaegeuk Kim2014-11-031-0/+1
| | | | | | When reading inline data, we should call flush_dcache_page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: call write_checkpoint under disabled gcJaegeuk Kim2014-11-031-0/+2
| | | | | | | During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any filesystem consistency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix possible data corruption in f2fs_write_begin()Jan Kara2014-11-031-13/+11
| | | | | | | | | | | | f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has inline data. However it uses its contents to decide whether it should just zero out the page or load data to it. Thus if we are unlucky we can zero out page contents instead of loading inline data into a page. CC: stable@vger.kernel.org CC: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use current_sit_addr to replace the open codeGu Zheng2014-11-031-11/+1
| | | | | Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bitGu Zheng2014-11-032-4/+4
| | | | | | | | Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean set/clear bit and return the old value, for better readability. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: set raw_super default to NULL to avoid compile warningGu Zheng2014-11-031-1/+1
| | | | | | | | Set raw_super default to NULL to avoid the possibly used uninitialized warning, though we may never hit it in fact. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_change_bit to simplify the change bit logicGu Zheng2014-11-033-8/+11
| | | | | | | | Introduce f2fs_change_bit to simplify the change bit logic in function set_to_next_nat{sit}. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove the redundant function cond_clear_inode_flagGu Zheng2014-11-032-11/+2
| | | | | | | Use clear_inode_flag to replace the redundant cond_clear_inode_flag. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove the seems unneeded argument 'type' from __get_victimGu Zheng2014-11-031-3/+5
| | | | | | | | Remove the unneeded argument 'type' from __get_victim, use NO_CHECK_TYPE directly when calling v_ops->get_victim(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()Jan Kara2014-11-031-1/+1
| | | | | | | | | | | | If user specifies too low end sector for trimming, f2fs_trim_fs() will use uninitialized value as a number of trimmed blocks and returns it to userspace. Initialize number of trimmed blocks early to avoid the problem. Coverity-id: 1248809 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
OpenPOWER on IntegriCloud