summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds2017-07-1510-85/+218
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull UBIFS updates from Richard Weinberger: - Updates and fixes for the file encryption mode - Minor improvements - Random fixes * tag 'upstream-4.13-rc1' of git://git.infradead.org/linux-ubifs: ubifs: Set double hash cookie also for RENAME_EXCHANGE ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrs ubifs: Don't leak kernel memory to the MTD ubifs: Change gfp flags in page allocation for bulk read ubifs: Fix oops when remounting with no_bulk_read. ubifs: Fail commit if TNC is obviously inconsistent ubifs: allow userspace to map mounts to volumes ubifs: Wire-up statx() support ubifs: Remove dead code from ubifs_get_link() ubifs: Massage debug prints wrt. fscrypt ubifs: Add assert to dent_key_init() ubifs: Fix unlink code wrt. double hash lookups ubifs: Fix data node size for truncating uncompressed nodes ubifs: Don't encrypt special files on creation ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_rename ubifs: Fix inode data budget in ubifs_mknod ubifs: Correctly evict xattr inodes ubifs: Unexport ubifs_inode_slab ubifs: don't bother checking for encryption key in ->mmap() ubifs: require key for truncate(2) of encrypted file
| * ubifs: Set double hash cookie also for RENAME_EXCHANGERichard Weinberger2017-07-141-0/+2
| | | | | | | | | | | | | | | | | | We developed RENAME_EXCHANGE and UBIFS_FLG_DOUBLE_HASH more or less in parallel and this case was forgotten. :-( Cc: stable@vger.kernel.org Fixes: d63d61c16972 ("ubifs: Implement UBIFS_FLG_DOUBLE_HASH") Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Massage assert in ubifs_xattr_set() wrt. init_xattrsXiaolei Li2017-07-143-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode is not locked in init_xattrs when creating a new inode. Without this patch, there will occurs assert when booting or creating a new file, if the kernel config CONFIG_SECURITY_SMACK is enabled. Log likes: UBIFS assert failed in ubifs_xattr_set at 298 (pid 1156) CPU: 1 PID: 1156 Comm: ldconfig Tainted: G S 4.12.0-rc1-207440-g1e70b02 #2 Hardware name: MediaTek MT2712 evaluation board (DT) Call trace: [<ffff000008088538>] dump_backtrace+0x0/0x238 [<ffff000008088834>] show_stack+0x14/0x20 [<ffff0000083d98d4>] dump_stack+0x9c/0xc0 [<ffff00000835d524>] ubifs_xattr_set+0x374/0x5e0 [<ffff00000835d7ec>] init_xattrs+0x5c/0xb8 [<ffff000008385788>] security_inode_init_security+0x110/0x190 [<ffff00000835e058>] ubifs_init_security+0x30/0x68 [<ffff00000833ada0>] ubifs_mkdir+0x100/0x200 [<ffff00000820669c>] vfs_mkdir+0x11c/0x1b8 [<ffff00000820b73c>] SyS_mkdirat+0x74/0xd0 [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 Signed-off-by: Xiaolei Li <xiaolei.li@mediatek.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Don't leak kernel memory to the MTDRichard Weinberger2017-07-141-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When UBIFS prepares data structures which will be written to the MTD it ensues that their lengths are multiple of 8. Since it uses kmalloc() the padded bytes are left uninitialized and we leak a few bytes of kernel memory to the MTD. To make sure that all bytes are initialized, let's switch to kzalloc(). Kzalloc() is fine in this case because the buffers are not huge and in the IO path the performance bottleneck is anyway the MTD. Cc: stable@vger.kernel.org Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Change gfp flags in page allocation for bulk readHyunchul Lee2017-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In low memory situations, page allocations for bulk read can kill applications for reclaiming memory, and print an failure message when allocations are failed. Because bulk read is just an optimization, we don't have to do these and can stop page allocations. Though this siutation happens rarely, add __GFP_NORETRY to prevent from excessive memory reclaim and killing applications, and __GFP_WARN to suppress this failure message. For this, Use readahead_gfp_mask for gfp flags when allocating pages. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix oops when remounting with no_bulk_read.karam.lee2017-07-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When remounting with the no_bulk_read option, there is a problem accessing the "bulk_read buffer(bu.buf)" which has already been freed. If the bulk_read option is enabled, ubifs_tnc_bulk_read uses the pre-allocated bu.buf. While bu.buf is being used by ubifs_tnc_bulk_read, remounting with no_bulk_read frees bu.buf. So I added code to check the use of "bu.buf" to avoid this situation. ------ I tested as follows(kernel v3.18) : Use the script to repeat "no_bulk_read <-> bulk_read" remount.sh #!/bin/sh while true do; mount -o remount,no_bulk_read ${MOUNT_POINT}; sleep 1; mount -o remount,bulk_read ${MOUNT_POINT}; sleep 1; done Perform read operation cat ${MOUNT_POINT}/* > /dev/null The problem is reproduced immediately. [ 234.256845][kernel.0]Internal error: Oops: 17 [#1] PREEMPT ARM [ 234.258557][kernel.0]CPU: 0 PID: 2752 Comm: cat Tainted: G W O 3.18.31+ #51 [ 234.259531][kernel.0]task: cbff8580 ti: cbd66000 task.ti: cbd66000 [ 234.260306][kernel.0]PC is at validate_data_node+0x10/0x264 [ 234.260994][kernel.0]LR is at ubifs_tnc_bulk_read+0x388/0x3ec [ 234.261712][kernel.0]pc : [<c01d98fc>] lr : [<c01dc300>] psr: 80000013 [ 234.261712][kernel.0]sp : cbd67ba0 ip : 00000001 fp : 00000000 [ 234.263337][kernel.0]r10: cd3e0260 r9 : c0df2008 r8 : 00000000 [ 234.264087][kernel.0]r7 : cd3e0000 r6 : 00000000 r5 : cd3e0278 r4 : cd3e0000 [ 234.264999][kernel.0]r3 : 00000003 r2 : cd3e0280 r1 : 00000000 r0 : cd3e0000 [ 234.265910][kernel.0]Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 234.266896][kernel.0]Control: 10c53c7d Table: 8c40c059 DAC: 00000015 [ 234.267711][kernel.0]Process cat (pid: 2752, stack limit = 0xcbd66400) [ 234.268525][kernel.0]Stack: (0xcbd67ba0 to 0xcbd68000) [ 234.269169][kernel.0]7ba0: cd7c3940 c03d8650 0001bfe0 00002ab2 00000000 cbd67c5c cbd67c58 0001bfe0 [ 234.270287][kernel.0]7bc0: cd3e0000 00002ab2 0001bfe0 00000014 cbd66000 cd3e0260 00000000 c01d6660 [ 234.271403][kernel.0]7be0: 00002ab2 00000000 c82a5800 ffffffff cd3e0298 cd3e0278 00000000 cd3e0000 [ 234.272520][kernel.0]7c00: 00000000 00000000 cd3e0260 c01dc300 00002ab2 00000000 60000013 d663affa [ 234.273639][kernel.0]7c20: cd3e01f0 cd3e01f0 60000013 c09397ec 00000000 cd3e0278 00002ab2 00000000 [ 234.274755][kernel.0]7c40: cd3e0000 c01dbf48 00000014 00000003 00000160 00000015 00000004 d663affa [ 234.275874][kernel.0]7c60: ccdaa978 cd3e0278 cd3e0000 cf32a5f4 ccdaa820 00000044 cbd66000 cd3e0260 [ 234.276992][kernel.0]7c80: 00000003 c01cec84 ccdaa8dc cbd67cc4 cbd67ec0 00000010 ccdaa978 00000000 [ 234.278108][kernel.0]7ca0: 0000015e ccdaa8dc 00000000 00000000 cf32a5d0 00000000 0000015f ccdaa8dc [ 234.279228][kernel.0]7cc0: 00000000 c8488300 0009e5a4 0000000e cbd66000 0000015e cf32a5f4 c0113c04 [ 234.280346][kernel.0]7ce0: 0000009f 0000003c c00098c4 ffffffff 00001000 00000000 000000ad 00000010 [ 234.281463][kernel.0]7d00: 00000038 cd68f580 00000150 c8488360 00000000 cbd67d30 cbd67d70 0000000e [ 234.282579][kernel.0]7d20: 00000010 00000000 c0951874 c0112a9c cf379b60 cf379b84 cf379890 cf3798b4 [ 234.283699][kernel.0]7d40: cf379578 cf37959c cf379380 cf3793a4 cf3790b0 cf3790d4 cf378fd8 cf378ffc [ 234.284814][kernel.0]7d60: cf378f48 cf378f6c cf32a5f4 cf32a5d0 00000000 00001000 00000018 00000000 [ 234.285932][kernel.0]7d80: 00001000 c0050da4 00000000 00001000 cec04c00 00000000 00001000 c0e11328 [ 234.287049][kernel.0]7da0: 00000000 00001000 cbd66000 00000000 00001000 c0012a60 00000000 00001000 [ 234.288166][kernel.0]7dc0: cbd67dd4 00000000 00001000 80000013 00000000 00001000 cd68f580 00000000 [ 234.289285][kernel.0]7de0: 00001000 c915d600 00000000 00001000 cbd67e48 00000000 00001000 00000018 [ 234.290402][kernel.0]7e00: 00000000 00001000 00000000 00000000 00001000 c915d768 c915d768 c0113550 [ 234.291522][kernel.0]7e20: cd68f580 cbd67e48 cd68f580 cb6713c0 00010000 000ac5a4 00000000 001fc5a4 [ 234.292637][kernel.0]7e40: 00000000 c8488300 cbd67ec0 00eb0000 cd68f580 c0113ee4 00000000 cbd67ec0 [ 234.293754][kernel.0]7e60: cd68f580 c8488300 cbd67ec0 00eb0000 cd68f580 00150000 c8488300 00eb0000 [ 234.294874][kernel.0]7e80: 00010000 c0112fd0 00000000 cbd67ec0 cd68f580 00150000 00000000 cd68f580 [ 234.295991][kernel.0]7ea0: cbd67ef0 c011308c 00000000 00000002 cd768850 00010000 00000000 c01133fc [ 234.297110][kernel.0]7ec0: 00150000 00000000 cbd67f50 00000000 00000000 cb6713c0 01000000 cbd67f48 [ 234.298226][kernel.0]7ee0: cbd67f50 c8488300 00000000 c0113204 00010000 01000000 00000000 cb6713c0 [ 234.299342][kernel.0]7f00: 00150000 00000000 cbd67f50 00000000 00000000 00000000 00000000 00000000 [ 234.300462][kernel.0]7f20: cbd67f50 01000000 01000000 cb6713c0 c8488300 c00ebba8 01000000 00000000 [ 234.301577][kernel.0]7f40: c8488300 cb6713c0 00000000 00000000 00000000 00000000 ccdaa820 00000000 [ 234.302697][kernel.0]7f60: 00000000 01000000 00000003 00000001 cbd66000 00000000 00000001 c00ec678 [ 234.303813][kernel.0]7f80: 00000000 00000200 00000000 01000000 01000000 00000000 00000000 000000ef [ 234.304933][kernel.0]7fa0: c000e904 c000e780 01000000 00000000 00000001 00000003 00000000 01000000 [ 234.306049][kernel.0]7fc0: 01000000 00000000 00000000 000000ef 00000001 00000003 01000000 00000001 [ 234.307165][kernel.0]7fe0: 00000000 beafb78c 0000ad08 00128d1c 60000010 00000001 00000000 00000000 [ 234.308292][kernel.0][<c01d98fc>] (validate_data_node) from [<c01dc300>] (ubifs_tnc_bulk_read+0x388/0x3ec) [ 234.309493][kernel.0][<c01dc300>] (ubifs_tnc_bulk_read) from [<c01cec84>] (ubifs_readpage+0x1dc/0x46c) [ 234.310656][kernel.0][<c01cec84>] (ubifs_readpage) from [<c0113c04>] (__generic_file_splice_read+0x29c/0x4cc) [ 234.311890][kernel.0][<c0113c04>] (__generic_file_splice_read) from [<c0113ee4>] (generic_file_splice_read+0xb0/0xf4) [ 234.313214][kernel.0][<c0113ee4>] (generic_file_splice_read) from [<c0112fd0>] (do_splice_to+0x68/0x7c) [ 234.314386][kernel.0][<c0112fd0>] (do_splice_to) from [<c011308c>] (splice_direct_to_actor+0xa8/0x190) [ 234.315544][kernel.0][<c011308c>] (splice_direct_to_actor) from [<c0113204>] (do_splice_direct+0x90/0xb8) [ 234.316741][kernel.0][<c0113204>] (do_splice_direct) from [<c00ebba8>] (do_sendfile+0x17c/0x2b8) [ 234.317838][kernel.0][<c00ebba8>] (do_sendfile) from [<c00ec678>] (SyS_sendfile64+0xc4/0xcc) [ 234.318890][kernel.0][<c00ec678>] (SyS_sendfile64) from [<c000e780>] (ret_fast_syscall+0x0/0x38) [ 234.319983][kernel.0]Code: e92d47f0 e24dd050 e59f9228 e1a04000 (e5d18014) Signed-off-by: karam.lee <karam.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fail commit if TNC is obviously inconsistentRichard Weinberger2017-07-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | A reference to LEB 0 or with length 0 in the TNC is never correct and could be caused by a memory corruption. Don't write such a bad index node to the MTD. Instead fail the commit which will turn UBIFS into read-only mode. This is less painful than having the bad reference on the MTD from where UBFIS has no chance to recover. Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: allow userspace to map mounts to volumesRabin Vincent2017-07-141-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There currently appears to be no way for userspace to find out the underlying volume number for a mounted ubifs file system, since ubifs uses anonymous block devices. The volume name is present in /proc/mounts but UBI volumes can be renamed after the volume has been mounted. To remedy this, show the UBI number and UBI volume number as part of the options visible under /proc/mounts. Also, accept and ignore the ubi= vol= options if they are used mounting (patch from Richard Weinberger). # mount -t ubifs ubi:baz x # mount ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2) # ubirename /dev/ubi0 baz bazz # mount ubi:baz on /root/x type ubifs (rw,relatime,ubi=0,vol=2) # ubinfo -d 0 -n 2 Volume ID: 2 (on ubi0) Type: dynamic Alignment: 1 Size: 67 LEBs (1063424 bytes, 1.0 MiB) State: OK Name: bazz Character device major/minor: 254:3 Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Wire-up statx() supportRichard Weinberger2017-07-141-0/+15
| | | | | | | | | | | | | | | | statx() can report what flags a file has, expose flags that UBIFS supports. Especially STATX_ATTR_COMPRESSED and STATX_ATTR_ENCRYPTED can be interesting for userspace. Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Remove dead code from ubifs_get_link()Richard Weinberger2017-07-141-6/+0
| | | | | | | | | | | | | | We check the length already, no need to check later again for an empty string. Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Massage debug prints wrt. fscryptRichard Weinberger2017-07-142-15/+4
| | | | | | | | | | | | | | If file names are encrypted we can no longer print them. That's why we have to change these prints or remove them completely. Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Add assert to dent_key_init()Richard Weinberger2017-07-141-0/+1
| | | | | | | | | | | | | | ...to make sure that we don't use it for double hashed lookups instead of dent_key_init_hash(). Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix unlink code wrt. double hash lookupsRichard Weinberger2017-07-143-24/+117
| | | | | | | | | | | | | | | | | | | | When removing an encrypted file with a long name and without having the key we have to be able to locate and remove the directory entry via a double hash. This corner case was simply forgotten. Fixes: 528e3d178f25 ("ubifs: Add full hash lookup support") Reported-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix data node size for truncating uncompressed nodesDavid Oberhollenzer2017-07-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the function truncate_data_node only updates the destination data node size if compression is used. For uncompressed nodes, the old length is incorrectly retained. This patch makes sure that the length is correctly set when compression is disabled. Fixes: 7799953b34d1 ("ubifs: Implement encrypt/decrypt for all IO") Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Don't encrypt special files on creationDavid Gstir2017-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | When a new inode is created, we check if the containing folder has a encryption policy set and inherit that. This should however only be done for regular files, links and subdirectories. Not for sockes fifos etc. Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Cc: stable@vger.kernel.org Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix memory leak in RENAME_WHITEOUT error path in do_renameHyunchul Lee2017-07-141-9/+5
| | | | | | | | | | | | | | in RENAME_WHITEOUT error path, fscrypt_name should be freed. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Fix inode data budget in ubifs_mknodHyunchul Lee2017-07-141-1/+1
| | | | | | | | | | | | | | Assign inode data budget to budget request correctly. Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Correctly evict xattr inodesRichard Weinberger2017-07-143-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS handles extended attributes just like files, as consequence of that, they also have inodes. Therefore UBIFS does all the inode machinery also for xattrs. Since new inodes have i_nlink of 1, a file or xattr inode will be evicted if i_nlink goes down to 0 after an unlink. UBIFS assumes this model also for xattrs, which is not correct. One can create a file "foo" with xattr "user.test". By reading "user.test" an inode will be created, and by deleting "user.test" it will get evicted later. The assumption breaks if the file "foo", which hosts the xattrs, will be removed. VFS nor UBIFS does not remove each xattr via ubifs_xattr_remove(), it just removes the host inode from the TNC and all underlying xattr nodes too and the inode will remain in the cache and wastes memory. To solve this problem, remove xattr inodes from the VFS inode cache in ubifs_xattr_remove() to make sure that they get evicted. Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system") Cc: <stable@vger.kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: Unexport ubifs_inode_slabRichard Weinberger2017-07-142-2/+1
| | | | | | | | | | | | | | This SLAB is only being used in super.c, there is no need to expose it into the global namespace. Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: don't bother checking for encryption key in ->mmap()Eric Biggers2017-07-051-9/+0
| | | | | | | | | | | | | | | | | | | | Since only an open file can be mmap'ed, and we only allow open()ing an encrypted file when its key is available, there is no need to check for the key again before permitting each mmap(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Richard Weinberger <richard@nod.at> Signed-off-by: Richard Weinberger <richard@nod.at>
| * ubifs: require key for truncate(2) of encrypted fileEric Biggers2017-07-051-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, filesystems allow truncate(2) on an encrypted file without the encryption key. However, it's impossible to correctly handle the case where the size being truncated to is not a multiple of the filesystem block size, because that would require decrypting the final block, zeroing the part beyond i_size, then encrypting the block. As other modifications to encrypted file contents are prohibited without the key, just prohibit truncate(2) as well, making it fail with ENOKEY. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | Merge tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2017-07-146-17/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull XFS fixes from Darrick Wong: "Largely debugging and regression fixes. - Add some locking assertions for the _ilock helpers. - Revert the XFS_QMOPT_NOLOCK patch; after discussion with hch the online fsck patch that would have needed it has been redesigned and no longer needs it. - Fix behavioral regression of SEEK_HOLE/DATA with negative offsets to match 4.12-era XFS behavior" * tag 'xfs-4.13-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsets Revert "xfs: grab dquots without taking the ilock" xfs: assert locking precondition in xfs_readlink_bmap_ilocked xfs: assert locking precondіtion in xfs_attr_list_int_ilocked xfs: fixup xfs_attr_get_ilocked
| * | vfs: in iomap seek_{hole,data}, return -ENXIO for negative offsetsDarrick J. Wong2017-07-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | In the iomap implementations of SEEK_HOLE and SEEK_DATA, make sure we return -ENXIO for negative offsets. Inspired-by: Mateusz S <muttdini@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | Revert "xfs: grab dquots without taking the ilock"Christoph Hellwig2017-07-132-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 50e0bdbe9f48f98bb02eac7030d682f4716884ae. The new XFS_QMOPT_NOLOCK isn't used at all, and conditional locking based on a flag is always the wrong thing to do - we should be having helpers that can be called without the lock instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: assert locking precondition in xfs_readlink_bmap_ilockedChristoph Hellwig2017-07-131-0/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: assert locking precondіtion in xfs_attr_list_int_ilockedChristoph Hellwig2017-07-131-0/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: fixup xfs_attr_get_ilockedChristoph Hellwig2017-07-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The comment mentioned the wrong lock. Also add an ASSERT to assert this locking precondition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* | | Merge branch 'for-4.13-part2' of ↵Linus Torvalds2017-07-147-57/+88
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've identified and fixed a silent corruption (introduced by code in the first pull), a fixup after the blk_status_t merge and two fixes to incremental send that Filipe has been hunting for some time" * 'for-4.13-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unexpected return value of bio_readpage_error btrfs: btrfs_create_repair_bio never fails, skip error handling btrfs: cloned bios must not be iterated by bio_for_each_segment_all Btrfs: fix write corruption due to bio cloning on raid5/6 Btrfs: incremental send, fix invalid memory access Btrfs: incremental send, fix invalid path for link commands
| * | | Btrfs: fix unexpected return value of bio_readpage_errorLiu Bo2017-07-142-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With blk_status_t conversion (that are now present in master), bio_readpage_error() may return 1 as now ->submit_bio_hook() may not set %ret if it runs without problems. This fixes that unexpected return value by changing btrfs_check_repairable() to return a bool instead of updating %ret, and patch is applicable to both codebases with and without blk_status_t. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: btrfs_create_repair_bio never fails, skip error handlingDavid Sterba2017-07-142-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the function uses the non-failing bio allocation, we can remove error handling from the callers as well. Signed-off-by: David Sterba <dsterba@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | btrfs: cloned bios must not be iterated by bio_for_each_segment_allDavid Sterba2017-07-144-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've started using cloned bios more in 4.13, there are some specifics regarding the iteration. Filipe found [1] that the raid56 iterated a cloned bio using bio_for_each_segment_all, which is incorrect. The cloned bios have wrong bi_vcnt and this could lead to silent corruptions. This patch adds assertions to all remaining bio_for_each_segment_all cases. [1] https://patchwork.kernel.org/patch/9838535/ Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | Btrfs: fix write corruption due to bio cloning on raid5/6Filipe Manana2017-07-131-8/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent changes to make bio cloning faster (added in the 4.13 merge window) by using the bio_clone_fast() API introduced a regression on raid5/6 modes, because cloned bios have an invalid bi_vcnt field (therefore it can not be used) and the raid5/6 code uses the bio_for_each_segment_all() API to iterate the segments of a bio, and this API uses a bio's bi_vcnt field. The issue is very simple to trigger by doing for example a direct IO write against a raid5 or raid6 filesystem and then attempting to read what we wrote before: $ mkfs.btrfs -m raid5 -d raid5 -f /dev/sdc /dev/sdd /dev/sde /dev/sdf $ mount /dev/sdc /mnt $ xfs_io -f -d -c "pwrite -S 0xab 0 1M" /mnt/foobar $ od -t x1 /mnt/foobar od: /mnt/foobar: read error: Input/output error For that example, the following is also reported in dmesg/syslog: [18274.985557] btrfs_print_data_csum_error: 18 callbacks suppressed [18274.995277] BTRFS warning (device sdf): csum failed root 5 ino 257 off 0 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18274.997205] BTRFS warning (device sdf): csum failed root 5 ino 257 off 4096 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.025221] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.047422] BTRFS warning (device sdf): csum failed root 5 ino 257 off 12288 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.054818] BTRFS warning (device sdf): csum failed root 5 ino 257 off 4096 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.054834] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.054943] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 2 [18275.055207] BTRFS warning (device sdf): csum failed root 5 ino 257 off 8192 csum 0x98f94189 expected csum 0x94374193 mirror 3 [18275.055571] BTRFS warning (device sdf): csum failed root 5 ino 257 off 0 csum 0x98f94189 expected csum 0x94374193 mirror 1 [18275.062171] BTRFS warning (device sdf): csum failed root 5 ino 257 off 12288 csum 0x98f94189 expected csum 0x94374193 mirror 1 A scrub will also fail correcting bad copies, mentioning the following in dmesg/syslog: [18276.128696] scrub_handle_errored_block: 498 callbacks suppressed [18276.129617] BTRFS warning (device sdf): checksum error at logical 2186346496 on dev /dev/sde, sector 2116608, root 5, inode 257, offset 65536, length 4096, links $ [18276.149235] btrfs_dev_stat_print_on_error: 498 callbacks suppressed [18276.157897] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [18276.206059] BTRFS warning (device sdf): checksum error at logical 2186477568 on dev /dev/sdd, sector 2116736, root 5, inode 257, offset 196608, length 4096, links$ [18276.206059] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [18276.306552] BTRFS warning (device sdf): checksum error at logical 2186543104 on dev /dev/sdd, sector 2116864, root 5, inode 257, offset 262144, length 4096, links$ [18276.319152] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [18276.394316] BTRFS warning (device sdf): checksum error at logical 2186739712 on dev /dev/sdf, sector 2116992, root 5, inode 257, offset 458752, length 4096, links$ [18276.396348] BTRFS error (device sdf): bdev /dev/sdf errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [18276.434127] BTRFS warning (device sdf): checksum error at logical 2186870784 on dev /dev/sde, sector 2117120, root 5, inode 257, offset 589824, length 4096, links$ [18276.434127] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [18276.500504] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186477568 on dev /dev/sdd [18276.538400] BTRFS warning (device sdf): checksum error at logical 2186481664 on dev /dev/sdd, sector 2116744, root 5, inode 257, offset 200704, length 4096, links$ [18276.540452] BTRFS error (device sdf): bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [18276.542012] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186481664 on dev /dev/sdd [18276.585030] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186346496 on dev /dev/sde [18276.598306] BTRFS warning (device sdf): checksum error at logical 2186412032 on dev /dev/sde, sector 2116736, root 5, inode 257, offset 131072, length 4096, links$ [18276.598310] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [18276.598582] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186350592 on dev /dev/sde [18276.603455] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [18276.638362] BTRFS warning (device sdf): checksum error at logical 2186354688 on dev /dev/sde, sector 2116624, root 5, inode 257, offset 73728, length 4096, links $ [18276.640445] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [18276.645942] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186354688 on dev /dev/sde [18276.657204] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186412032 on dev /dev/sde [18276.660563] BTRFS warning (device sdf): checksum error at logical 2186416128 on dev /dev/sde, sector 2116744, root 5, inode 257, offset 135168, length 4096, links$ [18276.664609] BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [18276.664609] BTRFS error (device sdf): unable to fixup (regular) error at logical 2186358784 on dev /dev/sde So fix this by using the bio_for_each_segment() API and setting before the bio's bi_iter field to the value of the corresponding btrfs bio container's saved iterator if we are processing a cloned bio in the raid5/6 code (the same code processes both cloned and non-cloned bios). This incorrect iteration of cloned bios was also causing some occasional BUG_ONs when running fstest btrfs/064, which have a trace like the following: [ 6674.416156] ------------[ cut here ]------------ [ 6674.416157] kernel BUG at fs/btrfs/raid56.c:1897! [ 6674.416159] invalid opcode: 0000 [#1] PREEMPT SMP [ 6674.416160] Modules linked in: dm_flakey dm_mod dax ppdev tpm_tis parport_pc tpm_tis_core evdev tpm psmouse sg i2c_piix4 pcspkr parport i2c_core serio_raw button s [ 6674.416184] CPU: 3 PID: 19236 Comm: kworker/u32:10 Not tainted 4.12.0-rc6-btrfs-next-44+ #1 [ 6674.416185] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 6674.416210] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [ 6674.416211] task: ffff880147f6c740 task.stack: ffffc90001fb8000 [ 6674.416229] RIP: 0010:__raid_recover_end_io+0x1ac/0x370 [btrfs] [ 6674.416230] RSP: 0018:ffffc90001fbbb90 EFLAGS: 00010217 [ 6674.416231] RAX: ffff8801ff4b4f00 RBX: 0000000000000002 RCX: 0000000000000001 [ 6674.416232] RDX: ffff880099b045d8 RSI: ffffffff81a5f6e0 RDI: 0000000000000004 [ 6674.416232] RBP: ffffc90001fbbbc8 R08: 0000000000000001 R09: 0000000000000001 [ 6674.416233] R10: ffffc90001fbbac8 R11: 0000000000001000 R12: 0000000000000002 [ 6674.416234] R13: ffff880099b045c0 R14: 0000000000000004 R15: ffff88012bff2000 [ 6674.416235] FS: 0000000000000000(0000) GS:ffff88023f2c0000(0000) knlGS:0000000000000000 [ 6674.416235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6674.416236] CR2: 00007f28cf282000 CR3: 00000001000c6000 CR4: 00000000000006e0 [ 6674.416239] Call Trace: [ 6674.416259] __raid56_parity_recover+0xfc/0x16e [btrfs] [ 6674.416276] raid56_parity_recover+0x157/0x16b [btrfs] [ 6674.416293] btrfs_map_bio+0xe0/0x259 [btrfs] [ 6674.416310] btrfs_submit_bio_hook+0xbf/0x147 [btrfs] [ 6674.416327] end_bio_extent_readpage+0x27b/0x4a0 [btrfs] [ 6674.416331] bio_endio+0x17d/0x1b3 [ 6674.416346] end_workqueue_fn+0x3c/0x3f [btrfs] [ 6674.416362] btrfs_scrubparity_helper+0x1aa/0x3b8 [btrfs] [ 6674.416379] btrfs_endio_helper+0xe/0x10 [btrfs] [ 6674.416381] process_one_work+0x276/0x4b6 [ 6674.416384] worker_thread+0x1ac/0x266 [ 6674.416386] ? rescuer_thread+0x278/0x278 [ 6674.416387] kthread+0x106/0x10e [ 6674.416389] ? __list_del_entry+0x22/0x22 [ 6674.416391] ret_from_fork+0x27/0x40 [ 6674.416395] Code: 44 89 e2 be 00 10 00 00 ff 15 b0 ab ef ff eb 72 4d 89 e8 89 d9 44 89 e2 be 00 10 00 00 ff 15 a3 ab ef ff eb 5d 41 83 fc ff 74 02 <0f> 0b 49 63 97 [ 6674.416432] RIP: __raid_recover_end_io+0x1ac/0x370 [btrfs] RSP: ffffc90001fbbb90 [ 6674.416434] ---[ end trace 74d56ebe7489dd6a ]--- Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
| * | | Btrfs: incremental send, fix invalid memory accessFilipe Manana2017-07-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing an incremental send, while processing an extent that changed between the parent and send snapshots and that extent was an inline extent in the parent snapshot, it's possible to access a memory region beyond the end of leaf if the inline extent is very small and it is the first item in a leaf. An example scenario is described below. The send snapshot has the following leaf: leaf 33865728 items 33 free space 773 generation 46 owner 5 fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7 chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f (...) item 14 key (335 EXTENT_DATA 0) itemoff 3052 itemsize 53 generation 36 type 1 (regular) extent data disk byte 12791808 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 15 key (335 EXTENT_DATA 8192) itemoff 2999 itemsize 53 generation 36 type 1 (regular) extent data disk byte 138170368 nr 225280 extent data offset 0 nr 225280 ram 225280 extent compression 0 (none) (...) And the parent snapshot has the following leaf: leaf 31272960 items 17 free space 17 generation 31 owner 5 fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7 chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f item 0 key (335 EXTENT_DATA 0) itemoff 3951 itemsize 44 generation 31 type 0 (inline) inline extent data size 23 ram_bytes 613 compression 1 (zlib) (...) When computing the send stream, it is detected that the extent of inode 335, at file offset 0, and at fs/btrfs/send.c:is_extent_unchanged() we grab the leaf from the parent snapshot and access the inline extent item. However, before jumping to the 'out' label, we access the 'offset' and 'disk_bytenr' fields of the extent item, which should not be done for inline extents since the inlined data starts at the offset of the 'disk_bytenr' field and can be very small. For example accessing the 'offset' field of the file extent item results in the following trace: [ 599.705368] general protection fault: 0000 [#1] PREEMPT SMP [ 599.706296] Modules linked in: btrfs psmouse i2c_piix4 ppdev acpi_cpufreq serio_raw parport_pc i2c_core evdev tpm_tis tpm_tis_core sg pcspkr parport tpm button su$ [ 599.709340] CPU: 7 PID: 5283 Comm: btrfs Not tainted 4.10.0-rc8-btrfs-next-46+ #1 [ 599.709340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 599.709340] task: ffff88023eedd040 task.stack: ffffc90006658000 [ 599.709340] RIP: 0010:read_extent_buffer+0xdb/0xf4 [btrfs] [ 599.709340] RSP: 0018:ffffc9000665ba00 EFLAGS: 00010286 [ 599.709340] RAX: db73880000000000 RBX: 0000000000000000 RCX: 0000000000000001 [ 599.709340] RDX: ffffc9000665ba60 RSI: db73880000000000 RDI: ffffc9000665ba5f [ 599.709340] RBP: ffffc9000665ba30 R08: 0000000000000001 R09: ffff88020dc5e098 [ 599.709340] R10: 0000000000001000 R11: 0000160000000000 R12: 6db6db6db6db6db7 [ 599.709340] R13: ffff880000000000 R14: 0000000000000000 R15: ffff88020dc5e088 [ 599.709340] FS: 00007f519555a8c0(0000) GS:ffff88023f3c0000(0000) knlGS:0000000000000000 [ 599.709340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 599.709340] CR2: 00007f1411afd000 CR3: 0000000235f8e000 CR4: 00000000000006e0 [ 599.709340] Call Trace: [ 599.709340] btrfs_get_token_64+0x93/0xce [btrfs] [ 599.709340] ? printk+0x48/0x50 [ 599.709340] btrfs_get_64+0xb/0xd [btrfs] [ 599.709340] process_extent+0x3a1/0x1106 [btrfs] [ 599.709340] ? btree_read_extent_buffer_pages+0x5/0xef [btrfs] [ 599.709340] changed_cb+0xb03/0xb3d [btrfs] [ 599.709340] ? btrfs_get_token_32+0x7a/0xcc [btrfs] [ 599.709340] btrfs_compare_trees+0x432/0x53d [btrfs] [ 599.709340] ? process_extent+0x1106/0x1106 [btrfs] [ 599.709340] btrfs_ioctl_send+0x960/0xe26 [btrfs] [ 599.709340] btrfs_ioctl+0x181b/0x1fed [btrfs] [ 599.709340] ? trace_hardirqs_on_caller+0x150/0x1ac [ 599.709340] vfs_ioctl+0x21/0x38 [ 599.709340] ? vfs_ioctl+0x21/0x38 [ 599.709340] do_vfs_ioctl+0x611/0x645 [ 599.709340] ? rcu_read_unlock+0x5b/0x5d [ 599.709340] ? __fget+0x6d/0x79 [ 599.709340] SyS_ioctl+0x57/0x7b [ 599.709340] entry_SYSCALL_64_fastpath+0x18/0xad [ 599.709340] RIP: 0033:0x7f51945eec47 [ 599.709340] RSP: 002b:00007ffc21c13e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 599.709340] RAX: ffffffffffffffda RBX: ffffffff81096459 RCX: 00007f51945eec47 [ 599.709340] RDX: 00007ffc21c13f20 RSI: 0000000040489426 RDI: 0000000000000004 [ 599.709340] RBP: ffffc9000665bf98 R08: 00007f519450d700 R09: 00007f519450d700 [ 599.709340] R10: 00007f519450d9d0 R11: 0000000000000202 R12: 0000000000000046 [ 599.709340] R13: ffffc9000665bf78 R14: 0000000000000000 R15: 00007f5195574040 [ 599.709340] ? trace_hardirqs_off_caller+0x43/0xb1 [ 599.709340] Code: 29 f0 49 39 d8 4c 0f 47 c3 49 03 81 58 01 00 00 44 89 c1 4c 01 c2 4c 29 c3 48 c1 f8 03 49 0f af c4 48 c1 e0 0c 4c 01 e8 48 01 c6 <f3> a4 31 f6 4$ [ 599.709340] RIP: read_extent_buffer+0xdb/0xf4 [btrfs] RSP: ffffc9000665ba00 [ 599.762057] ---[ end trace fe00d7af61b9f49e ]--- This is because the 'offset' field starts at an offset of 37 bytes (offsetof(struct btrfs_file_extent_item, offset)), has a length of 8 bytes and therefore attemping to read it causes a 1 byte access beyond the end of the leaf, as the first item's content in a leaf is located at the tail of the leaf, the item size is 44 bytes and the offset of that field plus its length (37 + 8 = 45) goes beyond the item's size by 1 byte. So fix this by accessing the 'offset' and 'disk_bytenr' fields after jumping to the 'out' label if we are processing an inline extent. We move the reading operation of the 'disk_bytenr' field too because we have the same problem as for the 'offset' field explained above when the inline data is less then 8 bytes. The access to the 'generation' field is also moved but just for the sake of grouping access to all the fields. Fixes: e1cbfd7bf6da ("Btrfs: send, fix file hole not being preserved due to inline extent") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Filipe Manana <fdmanana@suse.com>
| * | | Btrfs: incremental send, fix invalid path for link commandsFilipe Manana2017-07-061-30/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some scenarios an incremental send stream can contain link commands with an invalid target path. Such scenarios happen after moving some directory inode A, renaming a regular file inode B into the old name of inode A and finally creating a new hard link for inode B at directory inode A. Consider the following example scenario where this issue happens. Parent snapshot: . (ino 256) | |--- dir1/ (ino 257) | |--- dir2/ (ino 258) | |--- dir3/ (ino 259) | |--- file1 (ino 261) | |--- dir4/ (ino 262) | |--- dir5/ (ino 260) Send snapshot: . (ino 256) | |--- dir1/ (ino 257) |--- dir2/ (ino 258) | |--- dir3/ (ino 259) | |--- dir4 (ino 261) | |--- dir6/ (ino 263) |--- dir44/ (ino 262) |--- file11 (ino 261) |--- dir55/ (ino 260) When attempting to apply the corresponding incremental send stream, a link command contains an invalid target path which makes the receiver fail. The following is the verbose output of the btrfs receive command: receiving snapshot mysnap2 uuid=90076fe6-5ba6-e64a-9321-9279670ed16b (...) utimes utimes dir1 utimes dir1/dir2/dir3 utimes rename dir1/dir2/dir3/dir4 -> o262-7-0 link dir1/dir2/dir3/dir4 -> dir1/dir2/dir3/file1 link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1 ERROR: link dir1/dir2/dir3/dir4/file11 -> dir1/dir2/dir3/file1 failed: Not a directory The following steps happen during the computation of the incremental send stream the lead to this issue: 1) When processing inode 261, we orphanize inode 262 due to a name/location collision with one of the new hard links for inode 261 (created in the second step below). 2) We create one of the 2 new hard links for inode 261, the one whose location is at "dir1/dir2/dir3/dir4". 3) We then attempt to create the other new hard link for inode 261, which has inode 262 as its parent directory. Because the path for this new hard link was computed before we started processing the new references (hard links), it reflects the old name/location of inode 262, that is, it does not account for the orphanization step that happened when we started processing the new references for inode 261, whence it is no longer valid, causing the receiver to fail. So fix this issue by recomputing the full path of new references if we ended up orphanizing other inodes which are directories. A test case for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com>
* | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2017-07-141-24/+17
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge even more updates from Andrew Morton: - a few leftovers - fault-injector rework - add a module loader test driver * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kmod: throttle kmod thread limit kmod: add test driver to stress test the module loader MAINTAINERS: give kmod some maintainer love xtensa: use generic fb.h fault-inject: add /proc/<pid>/fail-nth fault-inject: simplify access check for fail-nth fault-inject: make fail-nth read/write interface symmetric fault-inject: parse as natural 1-based value for fail-nth write interface fault-inject: automatically detect the number base for fail-nth write interface kernel/watchdog.c: use better pr_fmt prefix MAINTAINERS: move the befs tree to kernel.org lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int mm: fix overflow check in expand_upwards()
| * | | | fault-inject: add /proc/<pid>/fail-nthAkinobu Mita2017-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fail-nth interface is only created in /proc/self/task/<current-tid>/. This change also adds it in /proc/<pid>/. This makes shell based tool a bit simpler. $ bash -c "builtin echo 100 > /proc/self/fail-nth && exec ls /" Link: http://lkml.kernel.org/r/1491490561-10485-6-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | fault-inject: simplify access check for fail-nthAkinobu Mita2017-07-141-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fail-nth file is created with 0666 and the access is permitted if and only if the task is current. This file is owned by the currnet user. So we can create it with 0644 and allow the owner to write it. This enables to watch the status of task->fail_nth from another processes. [akinobu.mita@gmail.com: don't convert unsigned type value as signed int] Link: http://lkml.kernel.org/r/1492444483-9239-1-git-send-email-akinobu.mita@gmail.com [akinobu.mita@gmail.com: avoid unwanted data race to task->fail_nth] Link: http://lkml.kernel.org/r/1499962492-8931-1-git-send-email-akinobu.mita@gmail.com Link: http://lkml.kernel.org/r/1491490561-10485-5-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | fault-inject: make fail-nth read/write interface symmetricAkinobu Mita2017-07-141-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The read interface for fail-nth looks a bit odd. Read from this file returns "NYYYY..." or "YYYYY..." (this makes me surprise when cat this file). Because there is no EOF condition. The first character indicates current->fail_nth is zero or not, and then current->fail_nth is reset to zero. Just returning task->fail_nth value is more natural to understand. Link: http://lkml.kernel.org/r/1491490561-10485-4-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | fault-inject: parse as natural 1-based value for fail-nth write interfaceAkinobu Mita2017-07-141-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value written to fail-nth file is parsed as 0-based. Parsing as one-based is more natural to understand and it enables to cancel the previous setup by simply writing '0'. This change also converts task->fail_nth from signed to unsigned int. Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | fault-inject: automatically detect the number base for fail-nth write interfaceAkinobu Mita2017-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Automatically detect the number base to use when writing to fail-nth file instead of always parsing as a decimal number. Link: http://lkml.kernel.org/r/1491490561-10485-2-git-send-email-akinobu.mita@gmail.com Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'befs-v4.13-rc1' of ↵Linus Torvalds2017-07-141-9/+6
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/luisbg/linux-befs Pull single befs fix from Luis de Bethencourt: "Very little activity in the befs file system this time since I'm busy settling into a new job" * tag 'befs-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/luisbg/linux-befs: befs: add kernel-doc formatting for befs_bt_read_super()
| * | | | befs: add kernel-doc formatting for befs_bt_read_super()Tommy Nguyen2017-07-091-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/befs/TODO mentions some comments needing conversion to Kernel-Doc formatting. This patch changes the comment describing befs_bt_read_super(). Signed-off-by: Tommy Nguyen <remyabel@gmail.com> Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
* | | | | Merge tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-07-1322-132/+545
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Fix -EACCESS on commit to DS handling - Fix initialization of nfs_page_array->npages - Only invalidate dentries that are actually invalid Features: - Enable NFSoRDMA transparent state migration - Add support for lookup-by-filehandle - Add support for nfs re-exporting Other bugfixes and cleanups: - Christoph cleaned up the way we declare NFS operations - Clean up various internal structures - Various cleanups to commits - Various improvements to error handling - Set the dt_type of . and .. entries in NFS v4 - Make slot allocation more reliable - Fix fscache stat printing - Fix uninitialized variable warnings - Fix potential list overrun in nfs_atomic_open() - Fix a race in NFSoRDMA RPC reply handler - Fix return size for nfs42_proc_copy() - Fix against MAC forgery timing attacks" * tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits) NFS: Don't run wake_up_bit() when nobody is waiting... nfs: add export operations nfs4: add NFSv4 LOOKUPP handlers nfs: add a nfs_ilookup helper nfs: replace d_add with d_splice_alias in atomic_open sunrpc: use constant time memory comparison for mac NFSv4.2 fix size storage for nfs42_proc_copy xprtrdma: Fix documenting comments in frwr_ops.c xprtrdma: Replace PAGE_MASK with offset_in_page() xprtrdma: FMR does not need list_del_init() xprtrdma: Demote "connect" log messages NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration xprtrdma: Don't defer MR recovery if ro_map fails xprtrdma: Fix FRWR invalidation error recovery xprtrdma: Fix client lock-up after application signal fires xprtrdma: Rename rpcrdma_req::rl_free xprtrdma: Pass only the list of registered MRs to ro_unmap_sync xprtrdma: Pre-mark remotely invalidated MRs xprtrdma: On invalidation failure, remove MWs from rl_registered ...
| * | | | | NFS: Don't run wake_up_bit() when nobody is waiting...Trond Myklebust2017-07-131-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "perf lock" shows fairly heavy contention for the bit waitqueue locks when doing an I/O heavy workload. Use a bit to tell whether or not there has been contention for a lock so that we can optimise away the bit waitqueue options in those cases. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | nfs: add export operationsPeng Tao2017-07-134-1/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This support for opening files on NFS by file handle, both through the open_by_handle syscall, and for re-exporting NFS (for example using a different version). The support is very basic for now, as each open by handle will have to do an NFSv4 open operation on the wire. In the future this will hopefully be mitigated by an open file cache, as well as various optimizations in NFS for this specific case. Signed-off-by: Peng Tao <tao.peng@primarydata.com> [hch: incorporated various changes, resplit the patches, new changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | nfs4: add NFSv4 LOOKUPP handlersJeff Layton2017-07-133-0/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be needed in order to implement the get_parent export op for nfsd. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | nfs: add a nfs_ilookup helperPeng Tao2017-07-131-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helper will allow to find an existing NFS inode by the file handle and fattr. Signed-off-by: Peng Tao <tao.peng@primarydata.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | nfs: replace d_add with d_splice_alias in atomic_openPeng Tao2017-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's a trival change but follows knfsd export document that asks for d_splice_alias during lookup. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | NFSv4.2 fix size storage for nfs42_proc_copyOlga Kornievskaia2017-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return size of COPY is u64 but it was assigned to an "int" status. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | NFSv4.1: Use seqid returned by EXCHANGE_ID after state migrationChuck Lever2017-07-131-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transparent State Migration copies a client's lease state from the server where a filesystem used to reside to the server where it now resides. When an NFSv4.1 client first contacts that destination server, it uses EXCHANGE_ID to detect trunking relationships. The lease that was copied there is returned to that client, but the destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to the client. This is because the lease was confirmed on the source server (before it was copied). When CONFIRMED_R is set, the client throws away the sequence ID returned by the server. During a Transparent State Migration, however there's no other way for the client to know what sequence ID to use with a lease that's been migrated. Therefore, the client must save and use the contrived slot sequence value returned by the destination server even when CONFIRMED_R is set. Note that some servers always return a seqid of 1 after a migration. Reported-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
OpenPOWER on IntegriCloud