summaryrefslogtreecommitdiffstats
path: root/fs/f2fs/segment.c
Commit message (Collapse)AuthorAgeFilesLines
* f2fs: use spinlock for segmap_lock instead of rwlockChao Yu2015-02-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | rwlock can provide better concurrency when there are much more readers than writers because readers can hold the rwlock simultaneously. But now, for segmap_lock rwlock in struct free_segmap_info, there is only one reader 'mount' from below call path: ->f2fs_fill_super ->build_segment_manager ->build_dirty_segmap ->init_dirty_segmap ->find_next_inuse read_lock ... read_unlock Now that our concurrency can not be improved since there is no other reader for this lock, we do not need to use rwlock_t type for segmap_lock, let's replace it with spinlock_t type. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid variable length arrayJaegeuk Kim2015-02-111-2/+8
| | | | | | | Instead of using variable length array, this patch let preallocate memory for them. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce macros to convert bytes and blocks in f2fsJaegeuk Kim2015-02-111-4/+4
| | | | | | | This patch adds two macros for transition between byte and block offsets. Currently, f2fs only supports 4KB blocks, so use the default size for now. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce a batched trimJaegeuk Kim2015-02-111-5/+12
| | | | | | | | | | | | This patch introduces a batched trimming feature, which submits split discard commands. This is to avoid long latency due to huge trim commands. If fstrim was triggered ranging from 0 to the end of device, we should lock all the checkpoint-related mutexes, resulting in very long latency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: split UMOUNT and FASTBOOT flagsJaegeuk Kim2015-02-111-6/+5
| | | | | | | | | | | | | | This patch adds FASTBOOT flag into checkpoint as follows. - CP_UMOUNT_FLAG is set when system is umounted. - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries was done. So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly. Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: align direct_io'ed data to sectionJaegeuk Kim2015-01-091-8/+19
| | | | | | | | | | | | | | This patch aligns the start block address of a file for direct io to the f2fs's section size. Some flash devices manage an over 4KB-sized page as a write unit, and if the direct_io'ed data are written but not aligned to that unit, the performance can be degraded due to the partial page copies. Thus, since f2fs has a section that is well aligned to FTL units, we can align the block address to the section size so that f2fs avoids this misalignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up to remove parameterJaegeuk Kim2015-01-091-0/+1
| | | | | | | This patch uses dn->data_blkaddr as a parameter for the destination block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add block count by in-place-update in stat infoChangman Lee2015-01-091-0/+1
| | | | | | | | This patch adds block count by in-place-update in stat. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: activate f2fs_trace_pidJaegeuk Kim2015-01-091-0/+2
| | | | | | This patch activates f2fs_trace_pid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use f2fs_io_info to clean up messy parameters during IO pathJaegeuk Kim2015-01-091-15/+13
| | | | | | | | | This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: readahead contiguous current summary blocks in checkpointChao Yu2015-01-091-3/+18
| | | | | | | | | | | Let's add readahead code for reading contiguous compact/normal summary blocks in checkpoint, then we will gain better performance in mount procedure. Changes from v1 o remove inappropriate 'unlikely' in npages_for_summary_flush. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unnecessary call to invalidate inmemory pagesJaegeuk Kim2015-01-091-17/+0
| | | | | | | Now we use inmemory pages for atomic write only and provide abort procedure, we don't need to truncate them explicitly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix small discards not to issue redundantlyJaegeuk Kim2015-01-091-3/+5
| | | | | | | | | | | The ckpt_valid_map and cur_valid_map are synced by seg_info_to_raw_sit. In the case of small discards, the candidates are selected before sync, while fitrim selects candidates after sync. So, for small discards, we need to add candidates only just being obsoleted. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: change atomic and volatile write policiesJaegeuk Kim2015-01-091-1/+1
| | | | | | | | | | | | | | | This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't need to call lock_op and lock_page for abortJaegeuk Kim2015-01-091-15/+20
| | | | | | We don't need to call lock_op and lock_page at the aborting path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong condition check to trigger f2fs_sync_fsJaegeuk Kim2015-01-091-1/+1
| | | | | | If there is not enough available memory, we need to trigger f2fs_sync_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: count the number of inmemory pagesJaegeuk Kim2014-12-081-0/+3
| | | | | | This patch adds counting # of inmemory pages in the page cache. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: set page private for inmemory pages for truncationJaegeuk Kim2014-12-081-0/+2
| | | | | | | The inmemory pages should be handled by invalidate_page since it needs to be released int the truncation path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do retry operations with cond_reschedJaegeuk Kim2014-12-081-3/+2
| | | | | | | | | | This patch revists retrial paths in f2fs. The basic idea is to use cond_resched instead of retrying from the very early stage. Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix livelock calling f2fs_iget during f2fs_evict_inodeJaegeuk Kim2014-11-231-1/+10
| | | | | | | | | | | | | | | | | | | | | In f2fs_evict_inode, commit_inmemory_pages f2fs_gc f2fs_iget iget_locked -> wait for inode free Here, if the inode is same as the one to be evicted, f2fs should wait forever. Actually, we should not call f2fs_balance_fs during f2fs_evict_inode to avoid this. But, the commit_inmem_pages calls f2fs_balance_fs by default, even if f2fs_evict_inode wants to free inmemory pages only. Hence, this patch adds to trigger f2fs_balance_fs only when there is something to write. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong data structure when create slabChangman Lee2014-11-231-1/+1
| | | | | | | | It used nat_entry_set when create slab for sit_entry_set. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: control the memory footprint used by ino entriesJaegeuk Kim2014-11-061-1/+2
| | | | | | | This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: disable roll-forward when active_logs = 2Jaegeuk Kim2014-11-051-2/+2
| | | | | | | The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: send discard commands in larger extentJaegeuk Kim2014-11-041-17/+27
| | | | | | | | If there is a chance to make a huge sized discard command, we don't need to split it out, since each blkdev_issue_discard should wait one at a time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not discard data protected by the previous checkpointJaegeuk Kim2014-11-031-1/+1
| | | | | | | We should not discard any data protected by the previous checkpoint all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: call write_checkpoint under disabled gcJaegeuk Kim2014-11-031-0/+2
| | | | | | | During the write_checkpoint, we should avoid f2fs_gc trigger to avoid any filesystem consistency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use current_sit_addr to replace the open codeGu Zheng2014-11-031-11/+1
| | | | | Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bitGu Zheng2014-11-031-2/+2
| | | | | | | | Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean set/clear bit and return the old value, for better readability. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid returning uninitialized value to userspace from f2fs_trim_fs()Jan Kara2014-11-031-1/+1
| | | | | | | | | | | | If user specifies too low end sector for trimming, f2fs_trim_fs() will use uninitialized value as a number of trimmed blocks and returns it to userspace. Initialize number of trimmed blocks early to avoid the problem. Coverity-id: 1248809 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid build warningJaegeuk Kim2014-11-031-1/+1
| | | | | | This patch removes build warning. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: invalidate inmemory pageJaegeuk Kim2014-11-031-0/+16
| | | | | | If user truncates file's data, we should truncate inmemory pages too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not make dirty any inmemory pagesJaegeuk Kim2014-11-031-1/+13
| | | | | | This patch let inmemory pages be clean all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support atomic writesJaegeuk Kim2014-10-061-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a very limited functionality for atomic write support. In order to support atomic write, this patch adds two ioctls: o F2FS_IOC_START_ATOMIC_WRITE o F2FS_IOC_COMMIT_ATOMIC_WRITE The database engine should be aware of the following sequence. 1. open -> ioctl(F2FS_IOC_START_ATOMIC_WRITE); 2. writes : all the written data will be treated as atomic pages. 3. commit -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); : this flushes all the data blocks to the disk, which will be shown all or nothing by f2fs recovery procedure. 4. repeat to #2. The IO pattens should be: ,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE CP | D D D D D D | FSYNC | D D D D | FSYNC ... `- COMMIT_ATOMIC_WRITE Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check the use of macros on block counts and addressesJaegeuk Kim2014-09-301-47/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch cleans up the existing and new macros for readability. Rule is like this. ,-----------------------------------------> MAX_BLKADDR -, | ,------------- TOTAL_BLKS ----------------------------, | | | | ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress | block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) | address 0..x................ a b c d ............................. | | global seg# 0...................... m ............................. | | | | `------- MAIN_SEGS -----------' `-------------- TOTAL_SEGS ---------------------------' | | seg# 0..........xx.................. = Note = o GET_SEGNO_FROM_SEG0 : blk address -> global segno o GET_SEGNO : blk address -> segno o START_BLOCK : segno -> starting block address Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce FITRIM in f2fs_ioctlJaegeuk Kim2014-09-301-10/+94
| | | | | | | | | This patch introduces FITRIM in f2fs_ioctl. In this case, f2fs will issue small discards and prefree discards as many as possible for the given area. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: change the ipu_policy option to enable combinationsJaegeuk Kim2014-09-231-1/+1
| | | | | | | This patch changes the ipu_policy setting to use any combination of orthogonal policies. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support large sector sizeChao Yu2014-09-231-2/+2
| | | | | | | | | | | | | | Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes sector device at maximum. But now f2fs only support 512 bytes size sector, so block device such as zRAM which uses page cache as its block storage space will not be mounted successfully as mismatch between sector size of zRAM and sector size of f2fs supported. In this patch we support large sector size in f2fs, so block device with sector size of 512/1024/2048/4096 bytes can be supported in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use MAX_BIO_BLOCKS(sbi)Jaegeuk Kim2014-09-231-1/+1
| | | | | | This patch cleans up a simple macro. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: give an option to enable in-place-updates during fsync to usersJaegeuk Kim2014-09-161-1/+2
| | | | | | | | | | | | | | | | | | | If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use lock-less list(llist) to simplify the flush cmd managementGu Zheng2014-09-091-20/+9
| | | | | | | | | | | | | | | | | | We use flush cmd control to collect many flush cmds, and flush them together. In this case, we use two list to manage the flush cmds (collect and dispatch), and one spin lock is used to protect this. In fact, the lock-less list(llist) is very suitable to this case, and we use simplify this routine. - v2: -use llist_for_each_entry_safe to fix possible use-after-free issue. -remove the unused field from struct flush_cmd. Thanks for Yu's suggestion. - Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: refactor flush_sit_entries codes for reducing SIT writesChao Yu2014-09-091-61/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit aec71382c681 ("f2fs: refactor flush_nat_entries codes for reducing NAT writes"), we descripte the issue as below: "Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time." Actually, we have the same problem in using SIT journal area. In this patch, firstly we will update sit journal with dirty entries as many as possible. Secondly if there is no space in sit journal, we will remove all entries in journal and walk through the whole dirty entry bitmap of sit, accounting dirty sit entries located in same SIT block to sit entry set. All entry sets are linked to list sit_entry_set in sm_info, sorted ascending order by count of entries in set. Later we flush entries in set which have fewest entries into journal as many as we can, and then flush dense set with merged entries to disk. In this way we can use sit journal area more effectively, also we will reduce SIT update, result in gaining in performance and saving lifetime of flash device. In my testing environment, it shows this patch can help to reduce SIT block update obviously. virtual machine + hard disk: fsstress -p 20 -n 400 -l 5 sit page num cp count sit pages/cp based 2006.50 1349.75 1.486 patched 1566.25 1463.25 1.070 Our latency of merging op is small when handling a great number of dirty SIT entries in flush_sit_entries: latency(ns) dirty sit count 36038 2151 49168 2123 37174 2232 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNOChao Yu2014-09-091-2/+2
| | | | | | | sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: handle bug cases by letting fsck.f2fs initiateJaegeuk Kim2014-09-091-1/+9
| | | | | | This patch adds to handle corner buggy cases for fsck.f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add BUG cases to initiate fsck.f2fsJaegeuk Kim2014-09-091-2/+2
| | | | | | | This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information. And it implements some void functions to initiate fsck.f2fs too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: need fsck.f2fs when f2fs_bug_on is triggeredJaegeuk Kim2014-09-091-8/+9
| | | | | | If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SBJaegeuk Kim2014-09-031-8/+6
| | | | | | This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove rewrite_node_pageJaegeuk Kim2014-08-211-49/+0
| | | | | | | | | | I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix typoarter972014-08-191-2/+2
| | | | | | | | | Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use for_each_set_bit to simplify the codeChao Yu2014-08-041-9/+4
| | | | | | | This patch uses for_each_set_bit to simplify some codes in f2fs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove redundant lines in allocate_data_blockDongho Sim2014-07-301-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are redundant lines in allocate_data_block. In this function, we call refresh_sit_entry with old seg and old curseg. After that, we call locate_dirty_segment with old curseg. But, the new address is always allocated from old curseg and we call locate_dirty_segment with old curseg in refresh_sit_entry. So, we do not need to call locate_dirty_segment with old curseg again. We've discussed like below: Jaegeuk said: "When considering SSR, we need to take care of the following scenario. - old segno : X - new address : Z - old curseg : Y This means, a new block is supposed to be written to Z from X. And Z is newly allocated in the same path from Y. In that case, we should trigger locate_dirty_segment for Y, since it was a current_segment and can be dirty owing to SSR. But that was not included in the dirty list." Changman said: "We already choosed old curseg(Y) and then we allocate new address(Z) from old curseg(Y). After that we call refresh_sit_entry(old address, new address). In the funcation, we call locate_dirty_segment with old seg and old curseg. So calling locate_dirty_segment after refresh_sit_entry again is redundant." Jaegeuk said: "Right. The new address is always allocated from old_curseg." Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Dongho Sim <dh.sim@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
OpenPOWER on IntegriCloud