From 67e753ca41782913d805ff4a8a2b0f60b26b7915 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Thu, 14 Mar 2013 10:49:23 +0200 Subject: UBIFS: make space fixup work in the remount case The UBIFS space fixup is a useful feature which allows to fixup the "broken" flash space at the time of the first mount. The "broken" space is usually the result of using a "dumb" industrial flasher which is not able to skip empty NAND pages and just writes all 0xFFs to the empty space, which has grave side-effects for UBIFS when UBIFS trise to write useful data to those empty pages. The fix-up feature works roughly like this: 1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image (see -F option) 2. when the file-system is mounted for the first time, UBIFS notices the fixup flag and re-writes the entire media atomically, which may take really a lot of time. 3. UBIFS clears the fixup flag in the superblock. This works fine when the file system is mounted R/W for the very first time. But it did not really work in the case when we first mount the file-system R/O, and then re-mount R/W. The reason was that we started the fixup procedure too late, which we cannot really do because we have to fixup the space before it starts being used. Signed-off-by: Artem Bityutskiy Reported-by: Mark Jackson Cc: stable@vger.kernel.org # 3.0+ --- fs/ubifs/super.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index ac838b8..f21acf0 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -1568,6 +1568,12 @@ static int ubifs_remount_rw(struct ubifs_info *c) c->remounting_rw = 1; c->ro_mount = 0; + if (c->space_fixup) { + err = ubifs_fixup_free_space(c); + if (err) + return err; + } + err = check_free_space(c); if (err) goto out; @@ -1684,12 +1690,6 @@ static int ubifs_remount_rw(struct ubifs_info *c) err = dbg_check_space_info(c); } - if (c->space_fixup) { - err = ubifs_fixup_free_space(c); - if (err) - goto out; - } - mutex_unlock(&c->umount_mutex); return err; -- cgit v1.1 From 64a817cfbded8674f345d1117b117f942a351a69 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 26 Mar 2013 14:11:13 -0400 Subject: nfsd4: reject "negative" acl lengths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since we only enforce an upper bound, not a lower bound, a "negative" length can get through here. The symptom seen was a warning when we attempt to a kmalloc with an excessive size. Reported-by: Toralf Förster Cc: stable@kernel.org Signed-off-by: J. Bruce Fields --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0116886..a272007 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -264,7 +264,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) { - int nace; + u32 nace; struct nfs4_ace *ace; READ_BUF(4); len += 4; -- cgit v1.1 From 35e5cbc0af240778e61113286c019837e06aeec6 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 29 Mar 2013 15:39:16 +0100 Subject: reiserfs: Fix warning and inode leak when deleting inode with xattrs After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs started failing to delete xattrs from inode. This was due to a buggy test for '.' and '..' in fill_with_dentries() which resulted in passing '.' and '..' entries to lookup_one_len() in some cases. That returned error and so we failed to iterate over all xattrs of and inode. Fix the test in fill_with_dentries() along the lines of the one in lookup_one_len(). Reported-by: Pawel Zawora CC: stable@vger.kernel.org Signed-off-by: Jan Kara --- fs/reiserfs/xattr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c index c196369..4cce1d9 100644 --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -187,8 +187,8 @@ fill_with_dentries(void *buf, const char *name, int namelen, loff_t offset, if (dbuf->count == ARRAY_SIZE(dbuf->dentries)) return -ENOSPC; - if (name[0] == '.' && (name[1] == '\0' || - (name[1] == '.' && name[2] == '\0'))) + if (name[0] == '.' && (namelen < 2 || + (namelen == 2 && name[1] == '.'))) return 0; dentry = lookup_one_len(name, dbuf->xadir, namelen); -- cgit v1.1 From c1681bf8a7b1b98edee8b862a42c19c4e53205fd Mon Sep 17 00:00:00 2001 From: Anatol Pomozov Date: Mon, 1 Apr 2013 09:47:56 -0700 Subject: loop: prevent bdev freeing while device in use struct block_device lifecycle is defined by its inode (see fs/block_dev.c) - block_device allocated first time we access /dev/loopXX and deallocated on bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile" we want that block_device stay alive until we destroy the loop device with "losetup -d". But because we do not hold /dev/loopXX inode its counter goes 0, and inode/bdev can be destroyed at any moment. Usually it happens at memory pressure or when user drops inode cache (like in the test below). When later in loop_clr_fd() we want to use bdev we have use-after-free error with following stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000280 bd_set_size+0x10/0xa0 loop_clr_fd+0x1f8/0x420 [loop] lo_ioctl+0x200/0x7e0 [loop] lo_compat_ioctl+0x47/0xe0 [loop] compat_blkdev_ioctl+0x341/0x1290 do_filp_open+0x42/0xa0 compat_sys_ioctl+0xc1/0xf20 do_sys_open+0x16e/0x1d0 sysenter_dispatch+0x7/0x1a To prevent use-after-free we need to grab the device in loop_set_fd() and put it later in loop_clr_fd(). The issue is reprodusible on current Linus head and v3.3. Here is the test: dd if=/dev/zero of=loop.file bs=1M count=1 while [ true ]; do losetup /dev/loop0 loop.file echo 2 > /proc/sys/vm/drop_caches losetup -d /dev/loop0 done [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every time we call loop_set_fd() we check that loop_device->lo_state is Lo_unbound and set it to Lo_bound If somebody will try to set_fd again it will get EBUSY. And if we try to loop_clr_fd() on unbound loop device we'll get ENXIO. loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under loop_device->lo_ctl_mutex. ] Signed-off-by: Anatol Pomozov Cc: Al Viro Signed-off-by: Linus Torvalds --- fs/block_dev.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/block_dev.c b/fs/block_dev.c index aea605c..aae187a 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -551,6 +551,7 @@ struct block_device *bdgrab(struct block_device *bdev) ihold(bdev->bd_inode); return bdev; } +EXPORT_SYMBOL(bdgrab); long nr_blockdev_pages(void) { -- cgit v1.1 From 8cde7ad17e4f4ff8d12ff60dd09c0a291cb0b61c Mon Sep 17 00:00:00 2001 From: Zheng Liu Date: Wed, 3 Apr 2013 12:27:18 -0400 Subject: ext4: fix big-endian bugs which could cause fs corruptions When an extent was zeroed out, we forgot to do convert from cpu to le16. It could make us hit a BUG_ON when we try to write dirty pages out. So fix it. [ Also fix a bug found by Dmitry Monakhov where we were missing le32_to_cpu() calls in the new indirect punch hole code. There are a number of other big endian warnings found by static code analyzers, but we'll wait for the next merge window to fix them all up. These fixes are designed to be Obviously Correct by code inspection, and easy to demonstrate that it won't make any difference (and hence, won't introduce any bugs) on little endian architectures such as x86. --tytso ] Signed-off-by: Zheng Liu Signed-off-by: "Theodore Ts'o" Reported-by: CAI Qian Reported-by: Christian Kujau Cc: Dmitry Monakhov --- fs/ext4/extents.c | 11 +++++++---- fs/ext4/indirect.c | 4 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 56efcaa..9c6d06d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2999,20 +2999,23 @@ static int ext4_split_extent_at(handle_t *handle, if (split_flag & EXT4_EXT_DATA_VALID1) { err = ext4_ext_zeroout(inode, ex2); zero_ex.ee_block = ex2->ee_block; - zero_ex.ee_len = ext4_ext_get_actual_len(ex2); + zero_ex.ee_len = cpu_to_le16( + ext4_ext_get_actual_len(ex2)); ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex2)); } else { err = ext4_ext_zeroout(inode, ex); zero_ex.ee_block = ex->ee_block; - zero_ex.ee_len = ext4_ext_get_actual_len(ex); + zero_ex.ee_len = cpu_to_le16( + ext4_ext_get_actual_len(ex)); ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex)); } } else { err = ext4_ext_zeroout(inode, &orig_ex); zero_ex.ee_block = orig_ex.ee_block; - zero_ex.ee_len = ext4_ext_get_actual_len(&orig_ex); + zero_ex.ee_len = cpu_to_le16( + ext4_ext_get_actual_len(&orig_ex)); ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(&orig_ex)); } @@ -3272,7 +3275,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle, if (err) goto out; zero_ex.ee_block = ex->ee_block; - zero_ex.ee_len = ext4_ext_get_actual_len(ex); + zero_ex.ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)); ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex)); err = ext4_ext_get_access(handle, inode, path + depth); diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c index b505a14..a041831 100644 --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -1539,9 +1539,9 @@ static int free_hole_blocks(handle_t *handle, struct inode *inode, blk = *i_data; if (level > 0) { ext4_lblk_t first2; - bh = sb_bread(inode->i_sb, blk); + bh = sb_bread(inode->i_sb, le32_to_cpu(blk)); if (!bh) { - EXT4_ERROR_INODE_BLOCK(inode, blk, + EXT4_ERROR_INODE_BLOCK(inode, le32_to_cpu(blk), "Read failure"); return -EIO; } -- cgit v1.1 From 57c7310b8eb96b0fe3b0aaa8dc194adbae03bef3 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 5 Mar 2013 16:01:47 -0500 Subject: GFS2: use kmalloc for lvb bitmap The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/gfs2/incore.h | 1 + fs/gfs2/lock_dlm.c | 31 ++++++++++++++++++------------- 2 files changed, 19 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 156e42e..5c29216 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -588,6 +588,7 @@ struct lm_lockstruct { struct dlm_lksb ls_control_lksb; /* control_lock */ char ls_control_lvb[GDLM_LVB_SIZE]; /* control_lock lvb */ struct completion ls_sync_wait; /* {control,mounted}_{lock,unlock} */ + char *ls_lvb_bits; spinlock_t ls_recover_spin; /* protects following fields */ unsigned long ls_recover_flags; /* DFL_ */ diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index 9802de0..b15bb45 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -580,7 +580,6 @@ static void gfs2_control_func(struct work_struct *work) { struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_control_work.work); struct lm_lockstruct *ls = &sdp->sd_lockstruct; - char lvb_bits[GDLM_LVB_SIZE]; uint32_t block_gen, start_gen, lvb_gen, flags; int recover_set = 0; int write_lvb = 0; @@ -634,7 +633,7 @@ static void gfs2_control_func(struct work_struct *work) return; } - control_lvb_read(ls, &lvb_gen, lvb_bits); + control_lvb_read(ls, &lvb_gen, ls->ls_lvb_bits); spin_lock(&ls->ls_recover_spin); if (block_gen != ls->ls_recover_block || @@ -664,10 +663,10 @@ static void gfs2_control_func(struct work_struct *work) ls->ls_recover_result[i] = 0; - if (!test_bit_le(i, lvb_bits + JID_BITMAP_OFFSET)) + if (!test_bit_le(i, ls->ls_lvb_bits + JID_BITMAP_OFFSET)) continue; - __clear_bit_le(i, lvb_bits + JID_BITMAP_OFFSET); + __clear_bit_le(i, ls->ls_lvb_bits + JID_BITMAP_OFFSET); write_lvb = 1; } } @@ -691,7 +690,7 @@ static void gfs2_control_func(struct work_struct *work) continue; if (ls->ls_recover_submit[i] < start_gen) { ls->ls_recover_submit[i] = 0; - __set_bit_le(i, lvb_bits + JID_BITMAP_OFFSET); + __set_bit_le(i, ls->ls_lvb_bits + JID_BITMAP_OFFSET); } } /* even if there are no bits to set, we need to write the @@ -705,7 +704,7 @@ static void gfs2_control_func(struct work_struct *work) spin_unlock(&ls->ls_recover_spin); if (write_lvb) { - control_lvb_write(ls, start_gen, lvb_bits); + control_lvb_write(ls, start_gen, ls->ls_lvb_bits); flags = DLM_LKF_CONVERT | DLM_LKF_VALBLK; } else { flags = DLM_LKF_CONVERT; @@ -725,7 +724,7 @@ static void gfs2_control_func(struct work_struct *work) */ for (i = 0; i < recover_size; i++) { - if (test_bit_le(i, lvb_bits + JID_BITMAP_OFFSET)) { + if (test_bit_le(i, ls->ls_lvb_bits + JID_BITMAP_OFFSET)) { fs_info(sdp, "recover generation %u jid %d\n", start_gen, i); gfs2_recover_set(sdp, i); @@ -758,7 +757,6 @@ static void gfs2_control_func(struct work_struct *work) static int control_mount(struct gfs2_sbd *sdp) { struct lm_lockstruct *ls = &sdp->sd_lockstruct; - char lvb_bits[GDLM_LVB_SIZE]; uint32_t start_gen, block_gen, mount_gen, lvb_gen; int mounted_mode; int retries = 0; @@ -857,7 +855,7 @@ locks_done: * lvb_gen will be non-zero. */ - control_lvb_read(ls, &lvb_gen, lvb_bits); + control_lvb_read(ls, &lvb_gen, ls->ls_lvb_bits); if (lvb_gen == 0xFFFFFFFF) { /* special value to force mount attempts to fail */ @@ -887,7 +885,7 @@ locks_done: * and all lvb bits to be clear (no pending journal recoveries.) */ - if (!all_jid_bits_clear(lvb_bits)) { + if (!all_jid_bits_clear(ls->ls_lvb_bits)) { /* journals need recovery, wait until all are clear */ fs_info(sdp, "control_mount wait for journal recovery\n"); goto restart; @@ -949,7 +947,6 @@ static int dlm_recovery_wait(void *word) static int control_first_done(struct gfs2_sbd *sdp) { struct lm_lockstruct *ls = &sdp->sd_lockstruct; - char lvb_bits[GDLM_LVB_SIZE]; uint32_t start_gen, block_gen; int error; @@ -991,8 +988,8 @@ restart: memset(ls->ls_recover_result, 0, ls->ls_recover_size*sizeof(uint32_t)); spin_unlock(&ls->ls_recover_spin); - memset(lvb_bits, 0, sizeof(lvb_bits)); - control_lvb_write(ls, start_gen, lvb_bits); + memset(ls->ls_lvb_bits, 0, GDLM_LVB_SIZE); + control_lvb_write(ls, start_gen, ls->ls_lvb_bits); error = mounted_lock(sdp, DLM_LOCK_PR, DLM_LKF_CONVERT); if (error) @@ -1022,6 +1019,12 @@ static int set_recover_size(struct gfs2_sbd *sdp, struct dlm_slot *slots, uint32_t old_size, new_size; int i, max_jid; + if (!ls->ls_lvb_bits) { + ls->ls_lvb_bits = kzalloc(GDLM_LVB_SIZE, GFP_NOFS); + if (!ls->ls_lvb_bits) + return -ENOMEM; + } + max_jid = 0; for (i = 0; i < num_slots; i++) { if (max_jid < slots[i].slot - 1) @@ -1057,6 +1060,7 @@ static int set_recover_size(struct gfs2_sbd *sdp, struct dlm_slot *slots, static void free_recover_size(struct lm_lockstruct *ls) { + kfree(ls->ls_lvb_bits); kfree(ls->ls_recover_submit); kfree(ls->ls_recover_result); ls->ls_recover_submit = NULL; @@ -1205,6 +1209,7 @@ static int gdlm_mount(struct gfs2_sbd *sdp, const char *table) ls->ls_recover_size = 0; ls->ls_recover_submit = NULL; ls->ls_recover_result = NULL; + ls->ls_lvb_bits = NULL; error = set_recover_size(sdp, NULL, 0); if (error) -- cgit v1.1 From 4146c3d469dc400eefa253fb37aa1e74fb5e41f8 Mon Sep 17 00:00:00 2001 From: Akinobu Mita Date: Thu, 7 Mar 2013 23:42:52 +0900 Subject: GFS2: use memchr_inv Use memchr_inv to verify that the specified memory range is cleared. Signed-off-by: Akinobu Mita Cc: Steven Whitehouse Cc: cluster-devel@redhat.com Cc: Christine Caulfield Cc: David Teigland --- fs/gfs2/lock_dlm.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c index b15bb45..c8423d6 100644 --- a/fs/gfs2/lock_dlm.c +++ b/fs/gfs2/lock_dlm.c @@ -483,12 +483,8 @@ static void control_lvb_write(struct lm_lockstruct *ls, uint32_t lvb_gen, static int all_jid_bits_clear(char *lvb) { - int i; - for (i = JID_BITMAP_OFFSET; i < GDLM_LVB_SIZE; i++) { - if (lvb[i]) - return 0; - } - return 1; + return !memchr_inv(lvb + JID_BITMAP_OFFSET, 0, + GDLM_LVB_SIZE - JID_BITMAP_OFFSET); } static void sync_wait_cb(void *arg) -- cgit v1.1 From 441362d06be349430d06e37286adce4b90e6ce96 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Mon, 11 Mar 2013 23:01:37 +0800 Subject: GFS2: return error if malloc failed in gfs2_rs_alloc() The error code in gfs2_rs_alloc() is set to ENOMEM when error but never be used, instead, gfs2_rs_alloc() always return 0. Fix to return 'error'. Signed-off-by: Wei Yongjun Signed-off-by: Steven Whitehouse --- fs/gfs2/rgrp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index d1f51fd..70d1cd0 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -576,7 +576,7 @@ int gfs2_rs_alloc(struct gfs2_inode *ip) RB_CLEAR_NODE(&ip->i_res->rs_node); out: up_write(&ip->i_rw_mutex); - return 0; + return error; } static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs) -- cgit v1.1 From c2952d202f710d326ac36a8ea6bd216b20615ec8 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Thu, 14 Mar 2013 15:49:59 +0000 Subject: GFS2: Fix unlock of fcntl locks during withdrawn state When withdraw occurs, we need to continue to allow unlocks of fcntl locks to occur, however these will only be local, since the node has withdrawn from the cluster. This prevents triggering a VFS level bug trap due to locks remaining when a file is closed. Signed-off-by: Steven Whitehouse --- fs/gfs2/file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 019f45e..d79c2da 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -923,8 +923,11 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl) cmd = F_SETLK; fl->fl_type = F_UNLCK; } - if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) + if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) { + if (fl->fl_type == F_UNLCK) + posix_lock_file_wait(file, fl); return -EIO; + } if (IS_GETLK(cmd)) return dlm_posix_get(ls->ls_dlm, ip->i_no_addr, file, fl); else if (fl->fl_type == F_UNLCK) -- cgit v1.1 From b2c87cae0edb1a99f7dd2751d5beb2cb97926514 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Fri, 22 Mar 2013 10:07:24 -0400 Subject: GFS2: Issue discards in 512b sectors This patch changes GFS2's discard issuing code so that it calls function sb_issue_discard rather than blkdev_issue_discard. The code was calling blkdev_issue_discard and specifying the correct sector offset and sector size, but blkdev_issue_discard expects these values to be in terms of 512 byte sectors, even if the native sector size for the device is different. Calling sb_issue_discard with the BLOCK size instead ensures the correct block-to-512b-sector translation. I verified that "minlen" is specified in blocks, so comparing it to a number of blocks is correct. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/rgrp.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 70d1cd0..5a51265 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1181,12 +1181,9 @@ int gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset, const struct gfs2_bitmap *bi, unsigned minlen, u64 *ptrimmed) { struct super_block *sb = sdp->sd_vfs; - struct block_device *bdev = sb->s_bdev; - const unsigned int sects_per_blk = sdp->sd_sb.sb_bsize / - bdev_logical_block_size(sb->s_bdev); u64 blk; sector_t start = 0; - sector_t nr_sects = 0; + sector_t nr_blks = 0; int rv; unsigned int x; u32 trimmed = 0; @@ -1206,35 +1203,34 @@ int gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset, if (diff == 0) continue; blk = offset + ((bi->bi_start + x) * GFS2_NBBY); - blk *= sects_per_blk; /* convert to sectors */ while(diff) { if (diff & 1) { - if (nr_sects == 0) + if (nr_blks == 0) goto start_new_extent; - if ((start + nr_sects) != blk) { - if (nr_sects >= minlen) { - rv = blkdev_issue_discard(bdev, - start, nr_sects, + if ((start + nr_blks) != blk) { + if (nr_blks >= minlen) { + rv = sb_issue_discard(sb, + start, nr_blks, GFP_NOFS, 0); if (rv) goto fail; - trimmed += nr_sects; + trimmed += nr_blks; } - nr_sects = 0; + nr_blks = 0; start_new_extent: start = blk; } - nr_sects += sects_per_blk; + nr_blks++; } diff >>= 2; - blk += sects_per_blk; + blk++; } } - if (nr_sects >= minlen) { - rv = blkdev_issue_discard(bdev, start, nr_sects, GFP_NOFS, 0); + if (nr_blks >= minlen) { + rv = sb_issue_discard(sb, start, nr_blks, GFP_NOFS, 0); if (rv) goto fail; - trimmed += nr_sects; + trimmed += nr_blks; } if (ptrimmed) *ptrimmed = trimmed; -- cgit v1.1 From b193d59a4863ea670872d76dc99231ddeb598625 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 4 Apr 2013 15:55:00 -0400 Subject: NFSv4: Fix a memory leak in nfs4_discover_server_trunking When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy the old one. Signed-off-by: Trond Myklebust Cc: Chuck Lever Cc: stable@vger.kernel.org [>=3.7] --- fs/nfs/nfs4state.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 6ace365..d41a351 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1886,7 +1886,13 @@ again: status = PTR_ERR(clnt); break; } - clp->cl_rpcclient = clnt; + /* Note: this is safe because we haven't yet marked the + * client as ready, so we are the only user of + * clp->cl_rpcclient + */ + clnt = xchg(&clp->cl_rpcclient, clnt); + rpc_shutdown_client(clnt); + clnt = clp->cl_rpcclient; goto again; case -NFS4ERR_MINOR_VERS_MISMATCH: -- cgit v1.1 From 7b1f1fd1842e6ede25183c267ae733a7f67f00bc Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 5 Apr 2013 16:11:11 -0400 Subject: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list It is unsafe to use list_for_each_entry_safe() here, because when we drop the nn->nfs_client_lock, we pin the _current_ list entry and ensure that it stays in the list, but we don't do the same for the _next_ list entry. Use of list_for_each_entry() is therefore the correct thing to do. Also fix the refcounting in nfs41_walk_client_list(). Finally, ensure that the nfs_client has finished being initialised and, in the case of NFSv4.1, that the session is set up. Signed-off-by: Trond Myklebust Cc: Chuck Lever Cc: Bryan Schumaker Cc: stable@vger.kernel.org [>= 3.7] --- fs/nfs/nfs4client.c | 44 ++++++++++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 16 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index ac4fc9a..c7b346f 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -300,7 +300,7 @@ int nfs40_walk_client_list(struct nfs_client *new, struct rpc_cred *cred) { struct nfs_net *nn = net_generic(new->cl_net, nfs_net_id); - struct nfs_client *pos, *n, *prev = NULL; + struct nfs_client *pos, *prev = NULL; struct nfs4_setclientid_res clid = { .clientid = new->cl_clientid, .confirm = new->cl_confirm, @@ -308,10 +308,23 @@ int nfs40_walk_client_list(struct nfs_client *new, int status = -NFS4ERR_STALE_CLIENTID; spin_lock(&nn->nfs_client_lock); - list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { + list_for_each_entry(pos, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the * remaining fields in "pos" */ - if (pos->cl_cons_state < NFS_CS_READY) + if (pos->cl_cons_state > NFS_CS_READY) { + atomic_inc(&pos->cl_count); + spin_unlock(&nn->nfs_client_lock); + + if (prev) + nfs_put_client(prev); + prev = pos; + + status = nfs_wait_client_init_complete(pos); + spin_lock(&nn->nfs_client_lock); + if (status < 0) + continue; + } + if (pos->cl_cons_state != NFS_CS_READY) continue; if (pos->rpc_ops != new->rpc_ops) @@ -423,16 +436,16 @@ int nfs41_walk_client_list(struct nfs_client *new, struct rpc_cred *cred) { struct nfs_net *nn = net_generic(new->cl_net, nfs_net_id); - struct nfs_client *pos, *n, *prev = NULL; + struct nfs_client *pos, *prev = NULL; int status = -NFS4ERR_STALE_CLIENTID; spin_lock(&nn->nfs_client_lock); - list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { + list_for_each_entry(pos, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the * remaining fields in "pos", especially the client * ID and serverowner fields. Wait for CREATE_SESSION * to finish. */ - if (pos->cl_cons_state < NFS_CS_READY) { + if (pos->cl_cons_state > NFS_CS_READY) { atomic_inc(&pos->cl_count); spin_unlock(&nn->nfs_client_lock); @@ -440,18 +453,17 @@ int nfs41_walk_client_list(struct nfs_client *new, nfs_put_client(prev); prev = pos; - nfs4_schedule_lease_recovery(pos); status = nfs_wait_client_init_complete(pos); - if (status < 0) { - nfs_put_client(pos); - spin_lock(&nn->nfs_client_lock); - continue; + if (status == 0) { + nfs4_schedule_lease_recovery(pos); + status = nfs4_wait_clnt_recover(pos); } - status = pos->cl_cons_state; spin_lock(&nn->nfs_client_lock); if (status < 0) continue; } + if (pos->cl_cons_state != NFS_CS_READY) + continue; if (pos->rpc_ops != new->rpc_ops) continue; @@ -469,17 +481,17 @@ int nfs41_walk_client_list(struct nfs_client *new, continue; atomic_inc(&pos->cl_count); - spin_unlock(&nn->nfs_client_lock); + *result = pos; dprintk("NFS: <-- %s using nfs_client = %p ({%d})\n", __func__, pos, atomic_read(&pos->cl_count)); - - *result = pos; - return 0; + break; } /* No matching nfs_client found. */ spin_unlock(&nn->nfs_client_lock); dprintk("NFS: <-- %s status = %d\n", __func__, status); + if (prev) + nfs_put_client(prev); return status; } #endif /* CONFIG_NFS_V4_1 */ -- cgit v1.1 From fa332941c0c7c00e3420078268b7558d0ef792b5 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 9 Apr 2013 12:56:52 -0400 Subject: NFSv4: Fix another potential state manager deadlock Don't hold the NFSv4 sequence id while we check for open permission. The call to ACCESS may block due to reboot recovery. Signed-off-by: Trond Myklebust --- fs/nfs/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 26431cf..0ad025e 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1046,6 +1046,7 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata) /* Save the delegation */ nfs4_stateid_copy(&stateid, &delegation->stateid); rcu_read_unlock(); + nfs_release_seqid(opendata->o_arg.seqid); ret = nfs_may_open(state->inode, state->owner->so_cred, open_mode); if (ret != 0) goto out; -- cgit v1.1 From 52f21999c7b921a0390708b66ed286282c2e4bee Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 28 Mar 2013 13:30:23 -0400 Subject: ecryptfs: close rmmod race Signed-off-by: Al Viro --- fs/ecryptfs/miscdev.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) (limited to 'fs') diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c index 412e6ed..e4141f2 100644 --- a/fs/ecryptfs/miscdev.c +++ b/fs/ecryptfs/miscdev.c @@ -80,13 +80,6 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) int rc; mutex_lock(&ecryptfs_daemon_hash_mux); - rc = try_module_get(THIS_MODULE); - if (rc == 0) { - rc = -EIO; - printk(KERN_ERR "%s: Error attempting to increment module use " - "count; rc = [%d]\n", __func__, rc); - goto out_unlock_daemon_list; - } rc = ecryptfs_find_daemon_by_euid(&daemon); if (!rc) { rc = -EINVAL; @@ -96,7 +89,7 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) if (rc) { printk(KERN_ERR "%s: Error attempting to spawn daemon; " "rc = [%d]\n", __func__, rc); - goto out_module_put_unlock_daemon_list; + goto out_unlock_daemon_list; } mutex_lock(&daemon->mux); if (daemon->flags & ECRYPTFS_DAEMON_MISCDEV_OPEN) { @@ -108,9 +101,6 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) atomic_inc(&ecryptfs_num_miscdev_opens); out_unlock_daemon: mutex_unlock(&daemon->mux); -out_module_put_unlock_daemon_list: - if (rc) - module_put(THIS_MODULE); out_unlock_daemon_list: mutex_unlock(&ecryptfs_daemon_hash_mux); return rc; @@ -147,7 +137,6 @@ ecryptfs_miscdev_release(struct inode *inode, struct file *file) "bug.\n", __func__, rc); BUG(); } - module_put(THIS_MODULE); return rc; } @@ -471,6 +460,7 @@ out_free: static const struct file_operations ecryptfs_miscdev_fops = { + .owner = THIS_MODULE, .open = ecryptfs_miscdev_open, .poll = ecryptfs_miscdev_poll, .read = ecryptfs_miscdev_read, -- cgit v1.1 From 8ce584c7416d8a85a6f3edc17d1cddefe331e87e Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 30 Mar 2013 20:13:46 -0400 Subject: procfs: add proc_remove_subtree() just what it sounds like; do that only to procfs subtrees you've created - doing that to something shared with another driver is not only antisocial, but might cause interesting races with proc_create() and its ilk. Signed-off-by: Al Viro --- fs/proc/generic.c | 119 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 89 insertions(+), 30 deletions(-) (limited to 'fs') diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 4b3b3ff..21e1a8f 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -755,37 +755,8 @@ void pde_put(struct proc_dir_entry *pde) free_proc_entry(pde); } -/* - * Remove a /proc entry and free it if it's not currently in use. - */ -void remove_proc_entry(const char *name, struct proc_dir_entry *parent) +static void entry_rundown(struct proc_dir_entry *de) { - struct proc_dir_entry **p; - struct proc_dir_entry *de = NULL; - const char *fn = name; - unsigned int len; - - spin_lock(&proc_subdir_lock); - if (__xlate_proc_name(name, &parent, &fn) != 0) { - spin_unlock(&proc_subdir_lock); - return; - } - len = strlen(fn); - - for (p = &parent->subdir; *p; p=&(*p)->next ) { - if (proc_match(len, fn, *p)) { - de = *p; - *p = de->next; - de->next = NULL; - break; - } - } - spin_unlock(&proc_subdir_lock); - if (!de) { - WARN(1, "name '%s'\n", name); - return; - } - spin_lock(&de->pde_unload_lock); /* * Stop accepting new callers into module. If you're @@ -817,6 +788,40 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent) spin_lock(&de->pde_unload_lock); } spin_unlock(&de->pde_unload_lock); +} + +/* + * Remove a /proc entry and free it if it's not currently in use. + */ +void remove_proc_entry(const char *name, struct proc_dir_entry *parent) +{ + struct proc_dir_entry **p; + struct proc_dir_entry *de = NULL; + const char *fn = name; + unsigned int len; + + spin_lock(&proc_subdir_lock); + if (__xlate_proc_name(name, &parent, &fn) != 0) { + spin_unlock(&proc_subdir_lock); + return; + } + len = strlen(fn); + + for (p = &parent->subdir; *p; p=&(*p)->next ) { + if (proc_match(len, fn, *p)) { + de = *p; + *p = de->next; + de->next = NULL; + break; + } + } + spin_unlock(&proc_subdir_lock); + if (!de) { + WARN(1, "name '%s'\n", name); + return; + } + + entry_rundown(de); if (S_ISDIR(de->mode)) parent->nlink--; @@ -827,3 +832,57 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent) pde_put(de); } EXPORT_SYMBOL(remove_proc_entry); + +int remove_proc_subtree(const char *name, struct proc_dir_entry *parent) +{ + struct proc_dir_entry **p; + struct proc_dir_entry *root = NULL, *de, *next; + const char *fn = name; + unsigned int len; + + spin_lock(&proc_subdir_lock); + if (__xlate_proc_name(name, &parent, &fn) != 0) { + spin_unlock(&proc_subdir_lock); + return -ENOENT; + } + len = strlen(fn); + + for (p = &parent->subdir; *p; p=&(*p)->next ) { + if (proc_match(len, fn, *p)) { + root = *p; + *p = root->next; + root->next = NULL; + break; + } + } + if (!root) { + spin_unlock(&proc_subdir_lock); + return -ENOENT; + } + de = root; + while (1) { + next = de->subdir; + if (next) { + de->subdir = next->next; + next->next = NULL; + de = next; + continue; + } + spin_unlock(&proc_subdir_lock); + + entry_rundown(de); + next = de->parent; + if (S_ISDIR(de->mode)) + next->nlink--; + de->nlink = 0; + if (de == root) + break; + pde_put(de); + + spin_lock(&proc_subdir_lock); + de = next; + } + pde_put(root); + return 0; +} +EXPORT_SYMBOL(remove_proc_subtree); -- cgit v1.1 From e9c5d8a562f01b211926d70443378eb14b29a676 Mon Sep 17 00:00:00 2001 From: Andrey Vagin Date: Tue, 9 Apr 2013 17:33:29 +0400 Subject: mnt: release locks on error path in do_loopback do_loopback calls lock_mount(path) and forget to unlock_mount if clone_mnt or copy_mnt fails. [ 77.661566] ================================================ [ 77.662939] [ BUG: lock held when returning to user space! ] [ 77.664104] 3.9.0-rc5+ #17 Not tainted [ 77.664982] ------------------------------------------------ [ 77.666488] mount/514 is leaving the kernel with locks still held! [ 77.668027] 2 locks held by mount/514: [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [] lock_mount+0x32/0xe0 [ 77.671755] #1: (&namespace_sem){+++++.}, at: [] lock_mount+0x4a/0xe0 Signed-off-by: Andrey Vagin Signed-off-by: Al Viro --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/namespace.c b/fs/namespace.c index 50ca17d..6c7d31e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1686,7 +1686,7 @@ static int do_loopback(struct path *path, const char *old_name, if (IS_ERR(mnt)) { err = PTR_ERR(mnt); - goto out; + goto out2; } err = graft_tree(mnt, path); -- cgit v1.1 From eb04e0ac198cec3bab407ad220438dfa65c19c67 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 10 Apr 2013 12:44:18 -0400 Subject: NFSv4: Doh! Typo in the fix to nfs41_walk_client_list Make sure that we set the status to 0 on success. Missed in testing because it never appears when doing multiple mounts to _different_ servers. Signed-off-by: Trond Myklebust Cc: # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list --- fs/nfs/nfs4client.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index c7b346f..66b6664 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -482,6 +482,7 @@ int nfs41_walk_client_list(struct nfs_client *new, atomic_inc(&pos->cl_count); *result = pos; + status = 0; dprintk("NFS: <-- %s using nfs_client = %p ({%d})\n", __func__, pos, atomic_read(&pos->cl_count)); break; -- cgit v1.1 From c369c9a4a7c82d33329d869cbaf93304cc7a0c40 Mon Sep 17 00:00:00 2001 From: Sachin Prabhu Date: Tue, 9 Apr 2013 18:17:41 +0100 Subject: cifs: Allow passwords which begin with a delimitor Fixes a regression in cifs_parse_mount_options where a password which begins with a delimitor is parsed incorrectly as being a blank password. Signed-off-by: Sachin Prabhu Acked-by: Jeff Layton Cc: Signed-off-by: Steve French --- fs/cifs/connect.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 991c63c..21b3a29 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1575,14 +1575,24 @@ cifs_parse_mount_options(const char *mountdata, const char *devname, } break; case Opt_blank_pass: - vol->password = NULL; - break; - case Opt_pass: /* passwords have to be handled differently * to allow the character used for deliminator * to be passed within them */ + /* + * Check if this is a case where the password + * starts with a delimiter + */ + tmp_end = strchr(data, '='); + tmp_end++; + if (!(tmp_end < end && tmp_end[1] == delim)) { + /* No it is not. Set the password to NULL */ + vol->password = NULL; + break; + } + /* Yes it is. Drop down to Opt_pass below.*/ + case Opt_pass: /* Obtain the value string */ value = strchr(data, '='); value++; -- cgit v1.1 From f2530dc71cf0822f90bb63ea4600caaef33a66bb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 9 Apr 2013 09:33:34 +0200 Subject: kthread: Prevent unpark race which puts threads on the wrong cpu The smpboot threads rely on the park/unpark mechanism which binds per cpu threads on a particular core. Though the functionality is racy: CPU0 CPU1 CPU2 unpark(T) wake_up_process(T) clear(SHOULD_PARK) T runs leave parkme() due to !SHOULD_PARK bind_to(CPU2) BUG_ON(wrong CPU) We cannot let the tasks move themself to the target CPU as one of those tasks is actually the migration thread itself, which requires that it starts running on the target cpu right away. The solution to this problem is to prevent wakeups in park mode which are not from unpark(). That way we can guarantee that the association of the task to the target cpu is working correctly. Add a new task state (TASK_PARKED) which prevents other wakeups and use this state explicitly for the unpark wakeup. Peter noticed: Also, since the task state is visible to userspace and all the parked tasks are still in the PID space, its a good hint in ps and friends that these tasks aren't really there for the moment. The migration thread has another related issue. CPU0 CPU1 Bring up CPU2 create_thread(T) park(T) wait_for_completion() parkme() complete() sched_set_stop_task() schedule(TASK_PARKED) The sched_set_stop_task() call is issued while the task is on the runqueue of CPU1 and that confuses the hell out of the stop_task class on that cpu. So we need the same synchronizaion before sched_set_stop_task(). Reported-by: Dave Jones Reported-and-tested-by: Dave Hansen Reported-and-tested-by: Borislav Petkov Acked-by: Peter Ziljstra Cc: Srivatsa S. Bhat Cc: dhillf@gmail.com Cc: Ingo Molnar Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos Signed-off-by: Thomas Gleixner --- fs/proc/array.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/proc/array.c b/fs/proc/array.c index f7ed9ee..cbd0f1b 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -143,6 +143,7 @@ static const char * const task_state_array[] = { "x (dead)", /* 64 */ "K (wakekill)", /* 128 */ "W (waking)", /* 256 */ + "P (parked)", /* 512 */ }; static inline const char *get_task_state(struct task_struct *tsk) -- cgit v1.1 From 4bc4bee4595662d8bff92180d5c32e3313a704b0 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Fri, 5 Apr 2013 20:50:09 +0000 Subject: Btrfs: make sure nbytes are right after log replay While trying to track down a tree log replay bug I noticed that fsck was always complaining about nbytes not being right for our fsynced file. That is because the new fsync stuff doesn't wait for ordered extents to complete, so the inodes nbytes are not necessarily updated properly when we log it. So to fix this we need to set nbytes to whatever it is on the inode that is on disk, so when we replay the extents we can just add the bytes that are being added as we replay the extent. This makes it work for the case that we have the wrong nbytes or the case that we logged everything and nbytes is actually correct. With this I'm no longer getting nbytes errors out of btrfsck. Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik Signed-off-by: Chris Mason --- fs/btrfs/tree-log.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 451fad9..ef96381 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -317,6 +317,7 @@ static noinline int overwrite_item(struct btrfs_trans_handle *trans, unsigned long src_ptr; unsigned long dst_ptr; int overwrite_root = 0; + bool inode_item = key->type == BTRFS_INODE_ITEM_KEY; if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) overwrite_root = 1; @@ -326,6 +327,9 @@ static noinline int overwrite_item(struct btrfs_trans_handle *trans, /* look for the key in the destination tree */ ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret < 0) + return ret; + if (ret == 0) { char *src_copy; char *dst_copy; @@ -367,6 +371,30 @@ static noinline int overwrite_item(struct btrfs_trans_handle *trans, return 0; } + /* + * We need to load the old nbytes into the inode so when we + * replay the extents we've logged we get the right nbytes. + */ + if (inode_item) { + struct btrfs_inode_item *item; + u64 nbytes; + + item = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_inode_item); + nbytes = btrfs_inode_nbytes(path->nodes[0], item); + item = btrfs_item_ptr(eb, slot, + struct btrfs_inode_item); + btrfs_set_inode_nbytes(eb, item, nbytes); + } + } else if (inode_item) { + struct btrfs_inode_item *item; + + /* + * New inode, set nbytes to 0 so that the nbytes comes out + * properly when we replay the extents. + */ + item = btrfs_item_ptr(eb, slot, struct btrfs_inode_item); + btrfs_set_inode_nbytes(eb, item, 0); } insert: btrfs_release_path(path); @@ -486,7 +514,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, int found_type; u64 extent_end; u64 start = key->offset; - u64 saved_nbytes; + u64 nbytes = 0; struct btrfs_file_extent_item *item; struct inode *inode = NULL; unsigned long size; @@ -496,10 +524,19 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, found_type = btrfs_file_extent_type(eb, item); if (found_type == BTRFS_FILE_EXTENT_REG || - found_type == BTRFS_FILE_EXTENT_PREALLOC) - extent_end = start + btrfs_file_extent_num_bytes(eb, item); - else if (found_type == BTRFS_FILE_EXTENT_INLINE) { + found_type == BTRFS_FILE_EXTENT_PREALLOC) { + nbytes = btrfs_file_extent_num_bytes(eb, item); + extent_end = start + nbytes; + + /* + * We don't add to the inodes nbytes if we are prealloc or a + * hole. + */ + if (btrfs_file_extent_disk_bytenr(eb, item) == 0) + nbytes = 0; + } else if (found_type == BTRFS_FILE_EXTENT_INLINE) { size = btrfs_file_extent_inline_len(eb, item); + nbytes = btrfs_file_extent_ram_bytes(eb, item); extent_end = ALIGN(start + size, root->sectorsize); } else { ret = 0; @@ -548,7 +585,6 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, } btrfs_release_path(path); - saved_nbytes = inode_get_bytes(inode); /* drop any overlapping extents */ ret = btrfs_drop_extents(trans, root, inode, start, extent_end, 1); BUG_ON(ret); @@ -635,7 +671,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans, BUG_ON(ret); } - inode_set_bytes(inode, saved_nbytes); + inode_add_bytes(inode, nbytes); ret = btrfs_update_inode(trans, root, inode); out: if (inode) -- cgit v1.1 From 5b55d708335a9e3e4f61f2dadf7511502205ccd1 Mon Sep 17 00:00:00 2001 From: Suleiman Souhlal Date: Sat, 13 Apr 2013 16:03:06 -0700 Subject: vfs: Revert spurious fix to spinning prevention in prune_icache_sb Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb"). This commit doesn't look right: since we are looking at the tail of the list (sb->s_inode_lru.prev) if we want to skip an inode, we should put it back at the head of the list instead of the tail, otherwise we will keep spinning on it. Discovered when investigating why prune_icache_sb came top in perf reports of a swapping load. Signed-off-by: Suleiman Souhlal Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Linus Torvalds --- fs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/inode.c b/fs/inode.c index f5f7c06..a898b3d 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -725,7 +725,7 @@ void prune_icache_sb(struct super_block *sb, int nr_to_scan) * inode to the back of the list so we don't spin on it. */ if (!spin_trylock(&inode->i_lock)) { - list_move_tail(&inode->i_lru, &sb->s_inode_lru); + list_move(&inode->i_lru, &sb->s_inode_lru); continue; } -- cgit v1.1 From a2fce9143057f4eb7675a21cca1b6beabe585c8b Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 17 Apr 2013 15:58:27 -0700 Subject: hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) Currently we fail to include any data on hugepages into coredump, because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently introduced by commit 314e51b9851b ("mm: kill vma flag VM_RESERVED and mm->reserved_vm counter"). This looks to me a serious regression, so let's fix it. Signed-off-by: Naoya Horiguchi Acked-by: Konstantin Khlebnikov Acked-by: Michal Hocko Reviewed-by: Rik van Riel Acked-by: KOSAKI Motohiro Acked-by: David Rientjes Cc: [3.7+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hugetlbfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 84e3d85..523464e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) * way when do_mmap_pgoff unwinds (may be important on powerpc * and ia64). */ - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND; vma->vm_ops = &hugetlb_vm_ops; if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) -- cgit v1.1 From 23d9e482136e31c9d287633a6e473daa172767c4 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Wed, 17 Apr 2013 15:58:28 -0700 Subject: fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() Documentation/filesystems/proc.txt says about coredump_filter bitmask, Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only effected by bit 5-6. However current code can go into the subsequent flag checks of bit 0-4 for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work as written in the document. Signed-off-by: Naoya Horiguchi Reviewed-by: Rik van Riel Acked-by: Michal Hocko Reviewed-by: HATAYAMA Daisuke Acked-by: KOSAKI Motohiro Acked-by: David Rientjes Cc: [3.7+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/binfmt_elf.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 3939829..86af964 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma, goto whole; if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE)) goto whole; + return 0; } /* Do not dump I/O mapped devices or special mappings */ -- cgit v1.1 From 12f267a20aecf8b84a2a9069b9011f1661c779b4 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Wed, 17 Apr 2013 15:58:33 -0700 Subject: hfsplus: fix potential overflow in hfsplus_file_truncate() Change a u32 to loff_t hfsplus_file_truncate(). Signed-off-by: Vyacheslav Dubeyko Cc: Christoph Hellwig Cc: Al Viro Cc: Hin-Tak Leung Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hfsplus/extents.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c index a94f0f7..fe0a762 100644 --- a/fs/hfsplus/extents.c +++ b/fs/hfsplus/extents.c @@ -533,7 +533,7 @@ void hfsplus_file_truncate(struct inode *inode) struct address_space *mapping = inode->i_mapping; struct page *page; void *fsdata; - u32 size = inode->i_size; + loff_t size = inode->i_size; res = pagecache_write_begin(NULL, mapping, size, 0, AOP_FLAG_UNINTERRUPTIBLE, -- cgit v1.1 From 0a82a8d132b26d438eb90b3ab35a7016e7227a1d Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 18 Apr 2013 09:00:26 -0700 Subject: Revert "block: add missing block_bio_complete() tracepoint" This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: Wanlong Gao Requested-by: Jens Axboe Cc: Tejun Heo Cc: Steven Rostedt Signed-off-by: Linus Torvalds --- fs/bio.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'fs') diff --git a/fs/bio.c b/fs/bio.c index bb5768f..b96fc6c 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1428,8 +1428,6 @@ void bio_endio(struct bio *bio, int error) else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) error = -EIO; - trace_block_bio_complete(bio, error); - if (bio->bi_end_io) bio->bi_end_io(bio, error); } -- cgit v1.1 From 91d80a84bbc8f28375cca7e65ec666577b4209ad Mon Sep 17 00:00:00 2001 From: Zhao Hongjiang Date: Fri, 26 Apr 2013 11:03:53 +0800 Subject: aio: fix possible invalid memory access when DEBUG is enabled dprintk() shouldn't access @ring after it's unmapped. Signed-off-by: Zhao Hongjiang Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/aio.c b/fs/aio.c index 3f941f2..1dc8786 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1029,9 +1029,9 @@ static int aio_read_evt(struct kioctx *ioctx, struct io_event *ent) spin_unlock(&info->ring_lock); out: - kunmap_atomic(ring); dprintk("leaving aio_read_evt: %d h%lu t%lu\n", ret, (unsigned long)ring->head, (unsigned long)ring->tail); + kunmap_atomic(ring); return ret; } -- cgit v1.1