summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* mm: remove page_file_indexHuang Ying2016-10-074-7/+7
| | | | | | | | | | | | | | | | | | | | | | After using the offset of the swap entry as the key of the swap cache, the page_index() becomes exactly same as page_file_index(). So the page_file_index() is removed and the callers are changed to use page_index() instead. Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* thp: reduce usage of huge zero page's atomic counterAaron Lu2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The global zero page is used to satisfy an anonymous read fault. If THP(Transparent HugePage) is enabled then the global huge zero page is used. The global huge zero page uses an atomic counter for reference counting and is allocated/freed dynamically according to its counter value. CPU time spent on that counter will greatly increase if there are a lot of processes doing anonymous read faults. This patch proposes a way to reduce the access to the global counter so that the CPU load can be reduced accordingly. To do this, a new flag of the mm_struct is introduced: MMF_USED_HUGE_ZERO_PAGE. With this flag, the process only need to touch the global counter in two cases: 1 The first time it uses the global huge zero page; 2 The time when mm_user of its mm_struct reaches zero. Note that right now, the huge zero page is eligible to be freed as soon as its last use goes away. With this patch, the page will not be eligible to be freed until the exit of the last process from which it was ever used. And with the use of mm_user, the kthread is not eligible to use huge zero page either. Since no kthread is using huge zero page today, there is no difference after applying this patch. But if that is not desired, I can change it to when mm_count reaches zero. Case used for test on Haswell EP: usemem -n 72 --readonly -j 0x200000 100G Which spawns 72 processes and each will mmap 100G anonymous space and then do read only access to that space sequentially with a step of 2MB. CPU cycles from perf report for base commit: 54.03% usemem [kernel.kallsyms] [k] get_huge_zero_page CPU cycles from perf report for this commit: 0.11% usemem [kernel.kallsyms] [k] mm_get_huge_zero_page Performance(throughput) of the workload for base commit: 1784430792 Performance(throughput) of the workload for this commit: 4726928591 164% increase. Runtime of the workload for base commit: 707592 us Runtime of the workload for this commit: 303970 us 50% drop. Link: http://lkml.kernel.org/r/fe51a88f-446a-4622-1363-ad1282d71385@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/task_mmu.c: make the task_mmu walk_page_range() limit in ↵James Morse2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | clear_refs_write() obvious Trying to walk all of virtual memory requires architecture specific knowledge. On x86_64, addresses must be sign extended from bit 48, whereas on arm64 the top VA_BITS of address space have their own set of page tables. clear_refs_write() calls walk_page_range() on the range 0 to ~0UL, it provides a test_walk() callback that only expects to be walking over VMAs. Currently walk_pmd_range() will skip memory regions that don't have a VMA, reporting them as a hole. As this call only expects to walk user address space, make it walk 0 to 'highest_vm_end'. Link: http://lkml.kernel.org/r/1472655792-22439-1-git-send-email-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext2/4, xfs: call thp_get_unmapped_area() for pmd mappingsToshi Kani2016-10-073-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | To support DAX pmd mappings with unmodified applications, filesystems need to align an mmap address by the pmd size. Call thp_get_unmapped_area() from f_op->get_unmapped_area. Note, there is no change in behavior for a non-DAX file. Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ocfs2: fix undefined struct variable in inode.hJoseph Qi2016-10-071-2/+0
| | | | | | | | | | | | | | | | The extern struct variable ocfs2_inode_cache is not defined. It meant to use ocfs2_inode_cachep defined in super.c, I think. Fortunately it is not used anywhere now, so no impact actually. Clean it up to fix this mistake. Link: http://lkml.kernel.org/r/57E1E49D.8050503@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Eric Ren <zren@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/ocfs2/dlm: remove deprecated create_singlethread_workqueue()Bhaktipriya Shridhar2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The workqueue "dlm_worker" queues a single work item &dlm->dispatched_work and thus it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Link: http://lkml.kernel.org/r/2b5ad8d6688effe1a9ddb2bc2082d26fbbe00302.1472590094.git.bhaktipriya96@gmail.com Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/ocfs2/super: remove deprecated create_singlethread_workqueue()Bhaktipriya Shridhar2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | The workqueue "ocfs2_wq" queues multiple work items viz &osb->la_enable_wq, &journal->j_recovery_work, &os->os_orphan_scan_work, &osb->osb_truncate_log_wq which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure because the workqueue is being used on a memory reclaim path. Link: http://lkml.kernel.org/r/66279de510a7f4cfc6e386d99b7e04b3f65fb11b.1472590094.git.bhaktipriya96@gmail.com Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/ocfs2/cluster: remove deprecated create_singlethread_workqueue()Bhaktipriya Shridhar2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | The workqueue "o2net_wq" queues multiple work items viz &old_sc->sc_shutdown_work, &sc->sc_rx_work, &sc->sc_connect_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Link: http://lkml.kernel.org/r/ddc12e5766c79ba26f8a00d98049107f8a1d4866.1472590094.git.bhaktipriya96@gmail.com Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/ocfs2/dlmfs: remove deprecated create_singlethread_workqueue()Bhaktipriya Shridhar2016-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | The workqueue "user_dlm_worker" queues a single work item &lockres->l_work per user_lock_res instance and so it doesn't require execution ordering. Hence, alloc_workqueue has been used to replace the deprecated create_singlethread_workqueue instance. The WQ_MEM_RECLAIM flag has been set to ensure forward progress under memory pressure. Since there are fixed number of work items, explicit concurrency limit is unnecessary here. Link: http://lkml.kernel.org/r/9748136d3a3b18138ad1d6ba708367aa1fe9f98c.1472590094.git.bhaktipriya96@gmail.com Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: clean up spinlock assertionsJan Kara2016-10-072-8/+4
| | | | | | | | | | | Use assert_spin_locked() macro instead of hand-made BUG_ON statements. Link: http://lkml.kernel.org/r/1474537439-18919-1-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: fix possible false warning when freeing eventsJan Kara2016-10-071-2/+11
| | | | | | | | | | | | | | | | | | | When freeing permission events by fsnotify_destroy_event(), the warning WARN_ON(!list_empty(&event->list)); may falsely hit. This is because although fanotify_get_response() saw event->response set, there is nothing to make sure the current CPU also sees the removal of the event from the list. Add proper locking around the WARN_ON() to avoid the false warning. Link: http://lkml.kernel.org/r/1473797711-14111-7-git-send-email-jack@suse.cz Reported-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fanotify: use notification_lock instead of access_lockJan Kara2016-10-071-8/+5
| | | | | | | | | | | | | | | | | | | | | Fanotify code has its own lock (access_lock) to protect a list of events waiting for a response from userspace. However this is somewhat awkward as the same list_head in the event is protected by notification_lock if it is part of the notification queue and by access_lock if it is part of the fanotify private queue which makes it difficult for any reliable checks in the generic code. So make fanotify use the same lock - notification_lock - for protecting its private event list. Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: convert notification_mutex to a spinlockJan Kara2016-10-074-36/+40
| | | | | | | | | | | | | | | | | | notification_mutex is used to protect the list of pending events. As such there's no reason to use a sleeping lock for it. Convert it to a spinlock. [jack@suse.cz: fixed version] Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: drop notification_mutex before destroying eventJan Kara2016-10-072-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | fsnotify_flush_notify() and fanotify_release() destroy notification event while holding notification_mutex. The destruction of fanotify event includes a path_put() call which may end up calling into a filesystem to delete an inode if we happen to be the last holders of dentry reference which happens to be the last holder of inode reference. That in turn may violate lock ordering for some filesystems since notification_mutex is also acquired e. g. during write when generating fanotify event. Also this is the only thing that forces notification_mutex to be a sleeping lock. So drop notification_mutex before destroying a notification event. Link: http://lkml.kernel.org/r/1473797711-14111-4-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-f2fs-4.9' of ↵Linus Torvalds2016-10-0619-510/+904
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've investigated how f2fs deals with errors given by our fault injection facility. With this, we could fix several corner cases. And, in order to improve the performance, we set inline_dentry by default and enhance the exisiting discard issue flow. In addition, we added f2fs_migrate_page for better memory management. Enhancements: - set inline_dentry by default - improve discard issue flow - add more fault injection cases in f2fs - allow block preallocation for encrypted files - introduce migrate_page callback function - avoid truncating the next direct node block at every checkpoint Bug fixes: - set page flag correctly between write_begin and write_end - missing error handling cases detected by fault injection - preallocate blocks regarding to 4KB alignement correctly - dentry and filename handling of encryption - lost xattrs of directories" * tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (69 commits) f2fs: introduce update_ckpt_flags to clean up f2fs: don't submit irrelevant page f2fs: fix to commit bio cache after flushing node pages f2fs: introduce get_checkpoint_version for cleanup f2fs: remove dead variable f2fs: remove redundant io plug f2fs: support checkpoint error injection f2fs: fix to recover old fault injection config in ->remount_fs f2fs: do fault injection initialization in default_options f2fs: remove redundant value definition f2fs: support configuring fault injection per superblock f2fs: adjust display format of segment bit f2fs: remove dirty inode pages in error path f2fs: do not unnecessarily null-terminate encrypted symlink data f2fs: handle errors during recover_orphan_inodes f2fs: avoid gc in cp_error case f2fs: should put_page for summary page f2fs: assign return value in f2fs_gc f2fs: add customized migrate_page callback f2fs: introduce cp_lock to protect updating of ckpt_flags ...
| * f2fs: introduce update_ckpt_flags to clean upJaegeuk Kim2016-09-301-23/+33
| | | | | | | | | | | | This patch add update_ckpt_flags() to clean up the flow. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't submit irrelevant pageChao Yu2016-09-301-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | While we call ->writepages, there are two cases: a. we didn't writeout any dirty pages, since they are writebacked by other thread concurrently. b. we writeout dirty pages, and have already submitted bio to block layer. In these cases, we don't need to do additional bio flushing unnecessarily, it may split bio in cache into smaller one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to commit bio cache after flushing node pagesChao Yu2016-09-301-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | In sync_node_pages, we won't check and commit last merged pages in private bio cache of f2fs, as these pages were taged as writeback, someone who is waiting for writebacking of the page will be blocked until the cache was committed by someone else. We need to commit node type bio cache to avoid potential deadlock or long delay of waiting writeback. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce get_checkpoint_version for cleanupTiezhu Yang2016-09-301-28/+38
| | | | | | | | | | | | | | | | | | | | There exists almost same codes when get the value of pre_version and cur_version in function validate_checkpoint, this patch adds get_checkpoint_version to clean up redundant codes. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove dead variableSheng Yong2016-09-301-2/+2
| | | | | | | | | | | | Signed-off-by: Sheng Yong <shengyong1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove redundant io plugChao Yu2016-09-301-3/+0
| | | | | | | | | | Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support checkpoint error injectionChao Yu2016-09-304-0/+12
| | | | | | | | | | | | | | | | | | This patch adds to support checkpoint error injection in f2fs for testing fatal error tolerance, it will be useful that it can simulate abnormal power off by f2fs itself instead of calling godown ioctl by running apps. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to recover old fault injection config in ->remount_fsChao Yu2016-09-301-0/+6
| | | | | | | | | | | | | | | | In ->remount_fs, we didn't recover original fault injection config if we encounter error, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: do fault injection initialization in default_optionsChao Yu2016-09-301-4/+4
| | | | | | | | | | | | | | | | Do fault injection initialization in default_options to keep consistent with other default option configurating. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove redundant value definitionYunlei He2016-09-301-4/+3
| | | | | | | | | | | | | | | | This patch remove redundant value definition in build_sit_entries Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support configuring fault injection per superblockChao Yu2016-09-3010-94/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we only support global fault injection configuration, so that when we configure type/rate of fault injection through sysfs, mount option, it will influence all f2fs partition which is being used. It is not make sence, since it will be not convenient if developer want to test separated partitions with different fault injection rate/type simultaneously, also it's not possible to enable fault injection in one partition and disable fault injection in other one. >From now on, we move global configuration of fault injection in module into per-superblock, hence injection testing can be more flexible. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: adjust display format of segment bitChao Yu2016-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just adjust segment bit info printed in procfs. Before: 1008 5|0 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1009 3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13 1010 3|1 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 After: 1008 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1009 4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef 1010 4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove dirty inode pages in error pathJaegeuk Kim2016-09-301-0/+1
| | | | | | | | | | | | | | | | | | When getting EIO while handling orphan inodes, we can get some dirty node pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try to flush node pages. But in this case, we should prevent to do that, since we will try again from the start. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: do not unnecessarily null-terminate encrypted symlink dataEric Biggers2016-09-301-2/+0
| | | | | | | | | | | | | | | | Null-terminating the fscrypt_symlink_data on read is unnecessary because it is not string data --- it contains binary ciphertext. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: handle errors during recover_orphan_inodesJaegeuk Kim2016-09-302-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes to handle EIO during recover_orphan_inode() given the below panic. F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs] ------------[ cut here ]------------ RIP: 0010:[<ffffffffc0b244e3>] [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs] RSP: 0018:ffff92f8b7fb7c30 EFLAGS: 00010246 RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8 RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58 R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620 FS: 00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0 Stack: ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28 ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0 ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d Call Trace: [<ffffffffbc27ff55>] evict+0xc5/0x1a0 [<ffffffffbc28090d>] iput+0x1ad/0x2c0 [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs] [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs] [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0 [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffffbc264e49>] mount_fs+0x39/0x170 [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160 [<ffffffffbc2881df>] do_mount+0x1cf/0xd00 [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170 [<ffffffffbc289003>] SyS_mount+0x83/0xd0 [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid gc in cp_error caseJaegeuk Kim2016-09-301-1/+2
| | | | | | | | | | | | | | | | Otherwise, we can hit f2fs_bug_on(sbi, !PageUptodate(sum_page)); Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: should put_page for summary pageJaegeuk Kim2016-09-301-2/+2
| | | | | | | | | | | | | | We should call put_page for preloaded summary pages in do_garbage_collect. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: assign return value in f2fs_gcJaegeuk Kim2016-09-301-3/+7
| | | | | | | | | | | | | | This patch adds a return value of write_checkpoint for f2fs_gc. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add customized migrate_page callbackWeichao Guo2016-09-304-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves the migration of dirty pages and allows migrating atomic written pages that F2FS uses in Page Cache. Instead of the fallback releasing page path, it provides better performance for memory compaction, CMA and other users of memory page migrating. For dirty pages, there is no need to write back first when migrating. For an atomic written page before committing, we can migrate the page and update the related 'inmem_pages' list at the same time. Signed-off-by: Weichao Guo <guoweichao@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix some coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce cp_lock to protect updating of ckpt_flagsChao Yu2016-09-306-28/+59
| | | | | | | | | | | | | | | | | | | | This patch introduces spinlock to protect updating process of ckpt_flags field in struct f2fs_checkpoint, it avoids incorrectly updating in race condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to avoid race condition when updating sbi flagChao Yu2016-09-301-4/+4
| | | | | | | | | | | | | | | | Making updating of sbi flag atomic by using {test,set,clear}_bit, otherwise in concurrency scenario, the flag could be updated incorrectly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: put directory inodes before checkpoint in roll-forward recoveryJaegeuk Kim2016-09-301-1/+3
| | | | | | | | | | | | Before checkpoint, we'd be better drop any inodes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use crc and cp version to determine roll-forward recoveryJaegeuk Kim2016-09-306-100/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we used cp_version only to detect recoverable dnodes. In order to avoid same garbage cp_version, we needed to truncate the next dnode during checkpoint, resulting in additional discard or data write. If we can distinguish this by using crc in addition to cp_version, we can remove this overhead. There is backward compatibility concern where it changes node_footer layout. So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to detect new layout. New layout will be activated only when this flag is set. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: preallocate blocks for encrypted fileYunlei He2016-09-222-8/+2
| | | | | | | | | | | | | | | | | | | | This patch allow preallocates data blocks for buffered aio writes in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix to avoid BUG_ON] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: show dirty inode numberChao Yu2016-09-222-1/+5
| | | | | | | | | | | | | | This patch enables showing dirty inode number in procfs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: support IO error injectionChao Yu2016-09-223-0/+9
| | | | | | | | | | | | | | | | This patch adds to support IO error injection for testing IO error tolerance of f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to return error number of read_all_xattrs correctlyChao Yu2016-09-221-15/+22
| | | | | | | | | | | | | | | | | | We treat all error in read_all_xattrs as a no memory error, which covers the real reason of failure in it. Fix it by return correct errno in order to reflect the real cause. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: make f2fs_filetype_table staticChao Yu2016-09-222-2/+1
| | | | | | | | | | | | | | | | There is no more user of f2fs_filetype_table outside of dir.c, make it static. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: handle error in recover_orphan_inodeJaegeuk Kim2016-09-151-1/+18
| | | | | | | | | | | | This patch enhances the error path in recover_orphan_inode. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove dead code f2fs_check_aclTiezhu Yang2016-09-141-1/+0
| | | | | | | | | | | | | | | | | | The macro f2fs_check_acl is defined but never used since the initial commit, this patch removes the code that has been dead for several years. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: exclude special cases for f2fs_move_file_rangeFan Li2016-09-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | When src and dst is the same file, and the latter part of source region overlaps with the former part of destination region, current implement will overwrite data which hasn't been moved yet and truncate data in overlapped region. This patch return -EINVAL when such cases occur and return 0 when source region and destination region is actually the same part of the same file. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to set PageUptodate in f2fs_write_end correctlyJaegeuk Kim2016-09-131-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, f2fs_write_begin sets PageUptodate all the time. But, when user tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider that the page is able to be copied partially afterwards. In such the case, we will lose the remaing region in the page. This patch fixes this by setting PageUptodate in f2fs_write_end as given copied result. In the short copy case, it returns zero to let generic_perform_write retry copying user data again. As a result, f2fs_write_end() works: PageUptodate len copied return retry 1. no 4096 4096 4096 false -> return 4096 2. no 4096 1024 0 true -> goto #1 case 3. yes 2048 2048 2048 false -> return 2048 4. yes 2048 1024 1024 false -> return 1024 Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix parameters of __exchange_data_blockFan Li2016-09-131-2/+3
| | | | | | | | | | | | | | | | __exchange_data_block should take block indexes as parameters instead of offsets in bytes. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid ENOMEM during roll-forward recoveryJaegeuk Kim2016-09-134-16/+42
| | | | | | | | | | | | | | This patch gives another chances during roll-forward recovery regarding to -ENOMEM. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add common iget in add_fsync_inodeJaegeuk Kim2016-09-121-26/+17
| | | | | | | | | | | | There is no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
OpenPOWER on IntegriCloud