summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* lib/vsprintf.c: remove %Z supportAlexey Dobriyan2017-02-277-16/+16
| | | | | | | | | | | | | | | | | | Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* scripts/spelling.txt: add "comsume(r)" pattern and fix typo instancesMasahiro Yamada2017-02-271-1/+1
| | | | | | | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: comsume||consume comsumer||consumer comsuming||consuming I see some variable names with this pattern, but this commit is only touching comment blocks to avoid unexpected impact. Link: http://lkml.kernel.org/r/1481573103-11329-19-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* scripts/spelling.txt: add "unneded" pattern and fix typo instancesMasahiro Yamada2017-02-271-1/+1
| | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: unneded||unneeded Link: http://lkml.kernel.org/r/1481573103-11329-15-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* scripts/spelling.txt: add "againt" pattern and fix typo instancesMasahiro Yamada2017-02-271-1/+1
| | | | | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: againt||against While we are here, fix the "capabilites" as well in the touched hunk in drivers/gpu/drm/drm_probe_helper.c. Link: http://lkml.kernel.org/r/1481573103-11329-13-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* scripts/spelling.txt: add "an user" pattern and fix typo instancesMasahiro Yamada2017-02-271-3/+3
| | | | | | | | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: an user||a user an userspace||a userspace I also added "userspace" to the list since it is a common word in Linux. I found some instances for "an userfaultfd", but I did not add it to the list. I felt it is endless to find words that start with "user" such as "userland" etc., so must draw a line somewhere. Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: use i_blocksize()Geliang Tang2017-02-272-2/+2
| | | | | | | | | | | Since i_blocksize() helper has been defined in fs.h, use it instead of open-coding. Link: http://lkml.kernel.org/r/1485184655-3895-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: use nilfs_btree_node_size()Geliang Tang2017-02-271-1/+1
| | | | | | | | | | Use nilfs_btree_node_size() instead of open-coding. Link: http://lkml.kernel.org/r/1485184655-3895-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: add i_blocksize()Fabian Frederick2017-02-2725-50/+50
| | | | | | | | | | | | | | | | | | | | | Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: make export work with cold dcacheFabian Frederick2017-02-271-0/+19
| | | | | | | | | | | | | | | This adds get_parent function so that nfs client can still work after cache drop (Tested on NFS v4 with echo 3 > /proc/sys/vm/drop_caches) [weiyongjun1@huawei.com: fix return value check in affs_get_parent()] Link: http://lkml.kernel.org/r/20170123141018.2331-1-weiyj.lk@gmail.com Link: http://lkml.kernel.org/r/20170109191208.6085-8-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Suggested-by: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs/namei.c: forward declarations clean-upFabian Frederick2017-02-271-20/+10
| | | | | | | | | | Move dentry_operations structures and remove forward declarations. Link: http://lkml.kernel.org/r/20170109191208.6085-7-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: add prefix to some functionsFabian Frederick2017-02-275-14/+15
| | | | | | | | | | | | | | secs_to_datestamp(time64_t secs, struct affs_date *ds); prot_to_mode(u32 prot); mode_to_prot(struct inode *inode); were declared without affs_ prefix Link: http://lkml.kernel.org/r/20170109191208.6085-6-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: use octal for permissionsFabian Frederick2017-02-271-18/+18
| | | | | | | | | | | According to commit f90774e1fd27 ("checkpatch: look for symbolic permissions and suggest octal instead") Link: http://lkml.kernel.org/r/20170109191208.6085-5-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: make affs exportableFabian Frederick2017-02-273-0/+42
| | | | | | | | | | | | Add standard functions making AFFS work with NFS. Functions based on ext4 implementation. Tested on loop device. Link: http://lkml.kernel.org/r/20170109191208.6085-4-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: add validation block functionFabian Frederick2017-02-271-4/+10
| | | | | | | | | | Avoid repeating 4 times the same calculation. Link: http://lkml.kernel.org/r/20170109191208.6085-3-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/affs: remove reference to affs_parent_ino()Fabian Frederick2017-02-271-1/+0
| | | | | | | | | | | | | | | | Patch series "make FS exportable plus some clean-up", v7. This small patchset makes AFFS work with NFS for standard operations. THis patch (of 7): affs_parent_ino() was removed a long time ago. Link: http://lkml.kernel.org/r/20170109191208.6085-2-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs,eventpoll: don't test for bitfield with stack valueCyrill Gorcunov2017-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | In case if epoll_ctl is called with operation EPOLL_CTL_DEL then @epds.events variable allocated on stack may contain random bits which we test then for EPOLLEXCLUSIVE. Since currently the test look like if (epds.events & EPOLLEXCLUSIVE) { if (op == EPOLL_CTL_MOD) goto error_tgt_fput; if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) || (epds.events & ~EPOLLEXCLUSIVE_OK_BITS))) goto error_tgt_fput; } Nothing serious will happen even if epds.events has this bit set, still better to be on safe side and make sure that we're to test this bit at all. Link: http://lkml.kernel.org/r/20170214154935.GG1850@uranus.lan Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* /proc/kcore: update physical address for kcore ram and textPratyush Anand2017-02-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is not true and could be misleading, since 0 is a valid physical address. User space tools like makedumpfile needs to know physical address for PT_LOAD segments of direct mapped regions. Therefore this patch updates paddr for such regions. It also sets an invalid paddr (-1) for other regions, so that user space tool can know whether a physical address provided in PT_LOAD is correct or not. I do not know why it was 0, which is a valid physical address. But certainly, it might break some user space tools, and those need to be fixed. For example, see following code from kexec-tools kexec/kexec-elf.c:build_mem_phdrs() if ((phdr->p_paddr + phdr->p_memsz) < phdr->p_paddr) { /* The memory address wraps */ if (probe_debug) { fprintf(stderr, "ELF address wrap around\n"); } return -1; } We do not need to perform above check for an invalid physical address. I think, kexec-tools and makedumpfile will need fixup. I already have those fixup which will be sent upstream once this patch makes through. Pro with this approach is that, it will help to calculate variable like page_offset, phys_base from PT_LOAD even when they are randomized and therefore will reduce many variable and version specific values in user space tools. Having an ASLR offset information can help to translate an identity mapped virtual address to a physical address. But that would be an additional field in PT_LOAD header structure and an arch dependent value. Moreover, sending a valid physical address like 0 does not seem right. So, IMHO it is better to fix that and send valid physical address when available (identity mapped). Link: http://lkml.kernel.org/r/f951340d2917cdd2a329fae9837a83f2059dc3b2.1485318868.git.panand@redhat.com Signed-off-by: Pratyush Anand <panand@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Simon Horman <simon.horman@netronome.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/reiserfs: atomically read inode sizeFabian Frederick2017-02-271-1/+1
| | | | | | | | | See i_size_read() comments in include/linux/fs.h Link: http://lkml.kernel.org/r/20170123174701.30394-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hfsplus: atomically read inode sizeFabian Frederick2017-02-271-1/+1
| | | | | | | | | See i_size_read() comments in include/linux/fs.h Link: http://lkml.kernel.org/r/20170123175338.3840-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: take more care to not update last_used on path walkIan Kent2017-02-271-6/+11
| | | | | | | | | | | | | | | | | | | | | | GUI environments seem to be becoming more agressive at scanning filesystems, to the point where autofs cannot expire mounts at all. This is one key reason the update of the autofs dentry info last_used field is done in the expire system when the dentry is seen to be in use. But somewhere along the way instances of the update has crept back into the autofs path walk functions which, with the more aggressive file access patterns, is preventing expiration. Changing the update in the path walk functions allows autofs to at least make progress in spite of frequent immediate re-mounts from file accesses. Link: http://lkml.kernel.org/r/148577167169.9801.1377050092212016834.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs: remove duplicated AUTOFS_DEV_IOCTL_SIZE definitionTomohiro Kusumi2017-02-271-2/+0
| | | | | | | | | | | This macro is already defined in uapi header. Also use this macro where possible. Link: http://lkml.kernel.org/r/148577166656.9801.10322423666945951186.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm,fs,dax: mark dax_iomap_pmd_fault as constArnd Bergmann2017-02-271-1/+2
| | | | | | | | | | | | | | | | | The two alternative implementations of dax_iomap_fault have different prototypes, and one of them is obviously wrong as seen from this build warning: fs/dax.c: In function 'dax_iomap_fault': fs/dax.c:1462:35: error: passing argument 2 of 'dax_iomap_pmd_fault' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] This marks the argument 'const' as in all the related functions. Fixes: a2d581675d48 ("mm,fs,dax: change ->pmd_fault to ->huge_fault") Link: http://lkml.kernel.org/r/20170227203349.3318733-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-linus-4.11-ofs2' of ↵Linus Torvalds2017-02-259-31/+49
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Orangefs: cleanups, a protocol fix and an added configuration button. Cleanups: - silence harmless integer overflow warning (from dan.carpenter@oracle.com) - Dan Carpenter influenced debugfs cleanups. - remove orangefs_backing_dev_info (from jack@suse.cz) Protocol fix: - fix buffer size mis-match between kernel space and user space New configuration button: - support readahead_readcnt parameter" * tag 'for-linus-4.11-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: fix buffer size mis-match between kernel space and user space. orangefs: Dan Carpenter influenced cleanups... orangefs: Remove orangefs_backing_dev_info orangefs: Support readahead_readcnt parameter. orangefs: silence harmless integer overflow warning
| * Merge tag 'v4.10' of ↵Mike Marshall2017-02-25538-21240/+16359
| |\ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into for-next Linux 4.10
| * | orangefs: fix buffer size mis-match between kernel space and user space.Mike Marshall2017-02-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The deamon through which the kernel module communicates with the userspace part of Orangefs, the "client-core", sends initialization data to the kernel module with ioctl. The initialization data was built by the client-core in a 2k buffer and copy_from_user'd into a 1k buffer in the kernel module. When more than 1k of initialization data needed to be sent, some was lost, reducing the usability of the control by which debug levels are set. This patch sets the kernel side buffer to 2K to match the userspace side... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | orangefs: Dan Carpenter influenced cleanups...Mike Marshall2017-02-092-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is simlar to one Dan Carpenter sent me, cleans up some return codes and whitespace errors. There was one place where he thought inserting an error message into the ring buffer might be too chatty, I hope I convinced him othewise. As a consolation <g> I changed a truly chatty error message in another location into a debug message, system-admins had already yelled at me about that one... Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | orangefs: Remove orangefs_backing_dev_infoJan Kara2017-02-033-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | It is not used anywhere. CC: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | orangefs: Support readahead_readcnt parameter.Martin Brandenburg2017-02-032-2/+31
| | | | | | | | | | | | | | | Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | orangefs: silence harmless integer overflow warningDan Carpenter2017-02-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue here is that in orangefs_bufmap_alloc() we do: bufmap->buffer_index_array = kzalloc(DIV_ROUND_UP(bufmap->desc_count, BITS_PER_LONG), GFP_KERNEL); If we choose a bufmap->desc_count like -31 then it means the DIV_ROUND_UP ends up having an integer overflow. The result is that kzalloc() returns the ZERO_SIZE_PTR and there is a static checker warning. But this bug is harmless because on the next lines we use ->desc_count to do a kcalloc(). That has integer overflow checking built in so the kcalloc() fails and we return an error code. Anyway, it doesn't make sense to talk about negative sizes and blocking them silences the static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | Merge tag 'v4.9' of ↵Mike Marshall2017-01-2787-1036/+1218
| |\ \ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into for-next Linux 4.9
* | \ \ Merge branch 'for-linus-4.11' of ↵Linus Torvalds2017-02-2541-1242/+1211
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has a series of fixes and cleanups that Dave Sterba has been collecting. There is a pretty big variety here, cleaning up internal APIs and fixing corner cases" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (124 commits) Btrfs: use the correct type when creating cow dio extent Btrfs: fix deadlock between dedup on same file and starting writeback btrfs: use btrfs_debug instead of pr_debug in transaction abort btrfs: btrfs_truncate_free_space_cache always allocates path btrfs: free-space-cache, clean up unnecessary root arguments btrfs: convert btrfs_inc_block_group_ro to accept fs_info btrfs: flush_space always takes fs_info->fs_root btrfs: pass fs_info to (more) routines that are only called with extent_root btrfs: qgroup: Move half of the qgroup accounting time out of commit trans btrfs: remove unused parameter from adjust_slots_upwards btrfs: remove unused parameters from __btrfs_write_out_cache btrfs: remove unused parameter from cleanup_write_cache_enospc btrfs: remove unused parameter from __add_inode_ref btrfs: remove unused parameter from clone_copy_inline_extent btrfs: remove unused parameters from btrfs_cmp_data btrfs: remove unused parameter from __add_inline_refs btrfs: remove unused parameters from scrub_setup_wr_ctx btrfs: remove unused parameter from create_snapshot btrfs: remove unused parameter from init_first_rw_device btrfs: remove unused parameter from __btrfs_alloc_chunk ...
| * | | | Btrfs: use the correct type when creating cow dio extentLiu Bo2017-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'BTRFS_ORDERED_REGULAR' was introduced for the cow case in patch 'Btrfs: specify a new ordered extent type for create_io_em', but it missed the directIO cow case. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | Btrfs: fix deadlock between dedup on same file and starting writebackFilipe Manana2017-02-221-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are deduping two ranges of the same file we need to make sure that we lock all pages in ascending order, that is, lock first the pages from the range with lower offset and then the pages from the other range, as otherwise we can deadlock with a concurrent task that is starting delalloc (writeback). Example trace: [74073.052218] INFO: task kworker/u32:10:17997 blocked for more than 120 seconds. [74073.053889] Tainted: G W 4.9.0-rc7-btrfs-next-36+ #1 [74073.055071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [74073.056696] kworker/u32:10 D 0 17997 2 0x00000000 [74073.058606] Workqueue: writeback wb_workfn (flush-btrfs-53176) [74073.061370] ffff880031e79858 ffff8802159d2580 ffff880237004580 ffff880031e79240 [74073.064784] ffff88023f4978c0 ffffc9000817b638 ffffffff814c15e1 0000000000000000 [74073.068386] ffff88023f4978d8 ffff88023f4978c0 000000000017b620 ffff880031e79240 [74073.071712] Call Trace: [74073.072884] [<ffffffff814c15e1>] ? __schedule+0x48f/0x6f4 [74073.075395] [<ffffffff814c1c8b>] ? bit_wait+0x2f/0x2f [74073.077511] [<ffffffff814c18d2>] schedule+0x8c/0xa0 [74073.079440] [<ffffffff814c4b36>] schedule_timeout+0x43/0xff [74073.081637] [<ffffffff8110953e>] ? time_hardirqs_on+0x9/0x14 [74073.083809] [<ffffffff81095c67>] ? trace_hardirqs_on_caller+0x16/0x197 [74073.086314] [<ffffffff810bde98>] ? timekeeping_get_ns+0x1e/0x32 [74073.100654] [<ffffffff810be048>] ? ktime_get+0x41/0x52 [74073.102619] [<ffffffff814c10f0>] io_schedule_timeout+0xa0/0x102 [74073.104771] [<ffffffff814c10f0>] ? io_schedule_timeout+0xa0/0x102 [74073.106969] [<ffffffff814c1ca6>] bit_wait_io+0x1b/0x39 [74073.108954] [<ffffffff814c1fb8>] __wait_on_bit_lock+0x4f/0x99 [74073.110981] [<ffffffff8112b692>] __lock_page+0x6b/0x6d [74073.112833] [<ffffffff8108ceb4>] ? autoremove_wake_function+0x3a/0x3a [74073.115010] [<ffffffffa031178b>] lock_page+0x2f/0x32 [btrfs] [74073.116999] [<ffffffffa0311d9f>] lock_delalloc_pages+0xc7/0x1a0 [btrfs] [74073.119243] [<ffffffffa0313d15>] find_lock_delalloc_range+0xc3/0x1a4 [btrfs] [74073.121636] [<ffffffffa0313e81>] writepage_delalloc.isra.31+0x8b/0x134 [btrfs] [74073.124229] [<ffffffffa0315d69>] __extent_writepage+0x1c1/0x2bf [btrfs] [74073.126372] [<ffffffffa03160f2>] extent_write_cache_pages.isra.30.constprop.49+0x28b/0x36c [btrfs] [74073.129371] [<ffffffffa03165b9>] extent_writepages+0x4b/0x5c [btrfs] [74073.131440] [<ffffffffa02fcb59>] ? insert_reserved_file_extent.constprop.42+0x261/0x261 [btrfs] [74073.134303] [<ffffffff811b4ce4>] ? writeback_sb_inodes+0xe0/0x4a1 [74073.136298] [<ffffffffa02fab7f>] btrfs_writepages+0x28/0x2a [btrfs] [74073.138248] [<ffffffff81138200>] do_writepages+0x23/0x2c [74073.139910] [<ffffffff811b3cab>] __writeback_single_inode+0x105/0x6d2 [74073.142003] [<ffffffff811b4e96>] writeback_sb_inodes+0x292/0x4a1 [74073.136298] [<ffffffffa02fab7f>] btrfs_writepages+0x28/0x2a [btrfs] [74073.138248] [<ffffffff81138200>] do_writepages+0x23/0x2c [74073.139910] [<ffffffff811b3cab>] __writeback_single_inode+0x105/0x6d2 [74073.142003] [<ffffffff811b4e96>] writeback_sb_inodes+0x292/0x4a1 [74073.143911] [<ffffffff811b511b>] __writeback_inodes_wb+0x76/0xae [74073.145787] [<ffffffff811b53ca>] wb_writeback+0x1cc/0x4d7 [74073.147452] [<ffffffff811b60cd>] wb_workfn+0x194/0x37d [74073.149084] [<ffffffff811b60cd>] ? wb_workfn+0x194/0x37d [74073.150726] [<ffffffff8106ce77>] ? process_one_work+0x154/0x4e4 [74073.152694] [<ffffffff8106cf96>] process_one_work+0x273/0x4e4 [74073.154452] [<ffffffff8106d6db>] worker_thread+0x1eb/0x2ca [74073.156138] [<ffffffff8106d4f0>] ? rescuer_thread+0x2b6/0x2b6 [74073.157837] [<ffffffff81072a81>] kthread+0xd5/0xdd [74073.159339] [<ffffffff810729ac>] ? __kthread_unpark+0x5a/0x5a [74073.161088] [<ffffffff814c6257>] ret_from_fork+0x27/0x40 [74073.162680] INFO: lockdep is turned off. [74073.163855] INFO: task do-dedup:30264 blocked for more than 120 seconds. [74073.181180] Tainted: G W 4.9.0-rc7-btrfs-next-36+ #1 [74073.181180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [74073.185296] fdm-stress D 0 30264 29974 0x00000000 [74073.186810] ffff880089595118 ffff880211b8eac0 ffff880237030380 ffff880089594b00 [74073.188998] ffff88023f2978c0 ffffc900063abb68 ffffffff814c15e1 0000000000000000 [74073.191070] ffff88023f2978d8 ffff88023f2978c0 00000000003abb50 ffff880089594b00 [74073.193286] Call Trace: [74073.193990] [<ffffffff814c15e1>] ? __schedule+0x48f/0x6f4 [74073.195418] [<ffffffff814c1c8b>] ? bit_wait+0x2f/0x2f [74073.196796] [<ffffffff814c18d2>] schedule+0x8c/0xa0 [74073.198163] [<ffffffff814c4b36>] schedule_timeout+0x43/0xff [74073.199621] [<ffffffff81095df5>] ? trace_hardirqs_on+0xd/0xf [74073.201100] [<ffffffff810bde98>] ? timekeeping_get_ns+0x1e/0x32 [74073.202686] [<ffffffff810be048>] ? ktime_get+0x41/0x52 [74073.204051] [<ffffffff814c10f0>] io_schedule_timeout+0xa0/0x102 [74073.205585] [<ffffffff814c10f0>] ? io_schedule_timeout+0xa0/0x102 [74073.207123] [<ffffffff814c1ca6>] bit_wait_io+0x1b/0x39 [74073.208238] [<ffffffff814c1fb8>] __wait_on_bit_lock+0x4f/0x99 [74073.208871] [<ffffffff8112b692>] __lock_page+0x6b/0x6d [74073.209430] [<ffffffff8108ceb4>] ? autoremove_wake_function+0x3a/0x3a [74073.210101] [<ffffffff8112b800>] lock_page+0x2f/0x32 [74073.210636] [<ffffffff8112c502>] pagecache_get_page+0x5e/0x153 [74073.211270] [<ffffffffa03257eb>] gather_extent_pages+0x4e/0x109 [btrfs] [74073.212166] [<ffffffffa032a04c>] btrfs_dedupe_file_range+0x1e1/0x4dd [btrfs] [74073.213257] [<ffffffff8118d9b5>] vfs_dedupe_file_range+0x1c1/0x221 [74073.214086] [<ffffffff8119e0c4>] do_vfs_ioctl+0x442/0x600 [74073.214767] [<ffffffff811a7874>] ? rcu_read_unlock+0x5b/0x5d [74073.215619] [<ffffffff811a7953>] ? __fget+0x6b/0x77 [74073.216338] [<ffffffff8119e2d9>] SyS_ioctl+0x57/0x79 [74073.217149] [<ffffffff814c5fea>] entry_SYSCALL_64_fastpath+0x18/0xad [74073.218102] [<ffffffff81109552>] ? time_hardirqs_off+0x9/0x14 [74073.218968] [<ffffffff810938ce>] ? trace_hardirqs_off_caller+0x1f/0xaa [74073.219938] INFO: lockdep is turned off. What happened was the following: CPU 1 CPU 2 btrfs_dedupe_file_range() --> using same inode as source and target --> src range is [768K, 1Mb[ --> dst range is [0, 256K[ btrfs_cmp_data_prepare() --> calls gather_extent_pages() for range [768K, 1Mb[ and locks all pages in that range do_writepages() btrfs_writepages() extent_writepages() extent_write_cache_pages() __extent_writepage() writepage_delalloc() find_lock_delalloc_range() --> finds range [0, 1Mb[ lock_delalloc_pages() --> locks all pages in the range [0, 768K[ --> tries to lock page at offset 768K --> deadlock --> calls gather_extent_pages() to lock pages in the range [0, 256K[ --> deadlock, task at CPU 1 already locked that range and it's trying to lock the range we locked previously So fix this by making sure that during a dedup we always lock first the pages from the range with lower offset. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
| * | | | btrfs: use btrfs_debug instead of pr_debug in transaction abortJeff Mahoney2017-02-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e5d6b12fe14 (Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors) added a pr_debug call to be printed when a transaction is aborted with -EIO instead of WARN. btrfs_debug prints which file system the message is associated with so let's use that instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: btrfs_truncate_free_space_cache always allocates pathJeff Mahoney2017-02-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_truncate_free_space_cache always allocates a btrfs_path structure but only uses it when the caller passes a block group. Let's move the allocation and free into the conditional. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: free-space-cache, clean up unnecessary root argumentsJeff Mahoney2017-02-175-26/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The free space cache APIs accept a root but always use the tree root. Also, btrfs_truncate_free_space_cache accepts a root AND an inode but the inode always points to the root anyway, so let's just pass the inode. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: convert btrfs_inc_block_group_ro to accept fs_infoJeff Mahoney2017-02-174-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_inc_block_group_ro is either passed the extent root or the dev root, but it doesn't do anything with the dev tree. Let's convert to passing an fs_info and using the extent root. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: flush_space always takes fs_info->fs_rootJeff Mahoney2017-02-171-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to pass a root to flush_space since it always uses the fs_root. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: pass fs_info to (more) routines that are only called with extent_rootJeff Mahoney2017-02-171-50/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Outside of interactions with qgroups, the roots passed in extent-tree.c are usually passed to ensure that we don't do refcounts on log trees or to get the allocation profile for an allocation request. Otherwise, it operates on the extent root. This patch converts some more routines in extent-tree.c that are always called with the extent root to accept an fs_info instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: qgroup: Move half of the qgroup accounting time out of commit transQu Wenruo2017-02-173-11/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just as Filipe pointed out, the most time consuming parts of qgroup are btrfs_qgroup_account_extents() and btrfs_qgroup_prepare_account_extents(). Which both call btrfs_find_all_roots() to get old_roots and new_roots ulist. What makes things worse is, we're calling that expensive btrfs_find_all_roots() at transaction committing time with TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction. Such behavior is necessary for @new_roots search as current btrfs_find_all_roots() can't do it correctly so we do call it just before switch commit roots. However for @old_roots search, it's not necessary as such search is based on commit_root, so it will always be correct and we can move it out of transaction committing. This patch moves the @old_roots search part out of commit_transaction(), so in theory we can half the time qgroup time consumption at commit_transaction(). But please note that, this won't speedup qgroup overall, the total time consumption is still the same, just reduce the performance stall. Cc: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from adjust_slots_upwardsDavid Sterba2017-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Never used. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameters from __btrfs_write_out_cacheDavid Sterba2017-02-171-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both unused after the call to update_cache_item has been moved to __btrfs_wait_cache_io. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from cleanup_write_cache_enospcDavid Sterba2017-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bitmap_list is unused since the io_ctl framework. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from __add_inode_refDavid Sterba2017-02-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unused since the helper has been split, eb used in the caller. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from clone_copy_inline_extentDavid Sterba2017-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Never used. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameters from btrfs_cmp_dataDavid Sterba2017-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the page locking has been reworked, we get all pages prepared via cmp_pages. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from __add_inline_refsDavid Sterba2017-02-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Never used. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameters from scrub_setup_wr_ctxDavid Sterba2017-02-171-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Never used. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from create_snapshotDavid Sterba2017-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name parameters have never been used, as the name is passed via the dentry. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | btrfs: remove unused parameter from init_first_rw_deviceDavid Sterba2017-02-171-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'device' used to be added in that function, but now it's done by the caller. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
OpenPOWER on IntegriCloud