summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | NFSv4: Ensure that we don't reap a delegation that is being returnedTrond Myklebust2015-03-021-5/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFS: Fix stateid used for NFS v4 closesAnna Schumaker2015-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After 566fcec60 the client uses the "current stateid" from the nfs4_state structure to close a file. This could potentially contain a delegation stateid, which is disallowed by the protocol and causes servers to return NFS4ERR_BAD_STATEID. This patch restores the (correct) behavior of sending the open stateid to close a file. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: 566fcec60 (NFSv4: Fix an atomicity problem in CLOSE) Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFSv4: Don't call put_rpccred() under the rcu_read_lock()Trond Myklebust2015-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | put_rpccred() can sleep. Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()") Cc: stable@vger.kernel.org # 2.6.35+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFS: Don't require a filehandle to refresh the inode in nfs_prime_dcache()Trond Myklebust2015-03-011-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server does not return a valid set of attributes that we can use to either create a file or refresh the inode, then there is no value in calling nfs_prime_dcache(). However if we're just refreshing the inode using the attributes that the server returned, then it shouldn't matter whether or not we have a filehandle, as long as we check the fsid+fileid combination. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFSv3: Use the readdir fileid as the mounted-on-fileidTrond Myklebust2015-03-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we call readdirplus, set the fileid normally returned by readdir as the mounted-on-fileid, since that is commonly the case if there is a mountpoint. To ensure that we get it right, we only set the flag if the readdir fileid differs from the one returned in the readdirplus attributes. This again means that we can avoid the issues described in commit 2ef47eb1aee17 ("NFS: Fix use of nfs_attr_use_mounted_on_fileid()"), which only fixed NFSv4. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFS: Don't invalidate a submounted dentry in nfs_prime_dcache()Trond Myklebust2015-03-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're traversing a directory which contains a submounted filesystem, or one that has a referral, the NFS server that is processing the READDIR request will often return information for the underlying (mounted-on) directory. It may, or may not, also return filehandle information. If this happens, and the lookup in nfs_prime_dcache() returns the dentry for the submounted directory, the filehandle comparison will fail, and we call d_invalidate(). Post-commit 8ed936b5671bf ("vfs: Lazily remove mounts on unlinked files and directories."), this means the entire subtree is unmounted. The following minimal patch addresses this problem by punting on the invalidation if there is a submount. Kudos to Neil Brown <neilb@suse.de> for having tracked down this issue (see link). Reported-by: Nix <nix@esperi.org.uk> Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | NFSv4: Set a barrier in the update_changeattr() helperTrond Myklebust2015-03-012-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that we don't regress the changes that were made to the directory. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Fix nfs_post_op_update_inode() to set an attribute barrierTrond Myklebust2015-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs_post_op_update_inode() is called after a self-induced attribute update. Ensure that it also sets the barrier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Remove size hack in nfs_inode_attrs_need_update()Trond Myklebust2015-03-011-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, we used to always OK attribute updates that extended the file size on the assumption that we might be performing writeback. Now that we have attribute barriers to protect the writeback related updates, we should remove this hack, as it can cause truncate() operations to apparently be reverted if/when a readahead or getattr RPC call races with our on-the-wire SETATTR. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSv4: Add attribute update barriers to delegreturn and pNFS layoutcommitTrond Myklebust2015-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that other operations that race with delegreturn and layoutcommit cannot revert the attribute updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Add attribute update barriers to NFS writebacksTrond Myklebust2015-03-016-8/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that other operations that race with our write RPC calls cannot revert the file size updates that were made on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Set an attribute barrier on all updatesTrond Myklebust2015-03-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that we update the attribute barrier even if there were no invalidations, provided that this value is newer than the old one. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Add attribute update barriers to nfs_setattr_update_inode()Trond Myklebust2015-03-014-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that other operations which raced with our setattr RPC call cannot revert the file attribute changes that were made on the server. To do so, we artificially bump the attribute generation counter on the inode so that all calls to nfs_fattr_init() that precede ours will be dropped. The motivation for the patch came from Chuck Lever's reports of readaheads racing with truncate operations and causing the file size to be reverted. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Add a helper to set attribute barriersTrond Myklebust2015-03-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFS: Ensure that buffered writes wait for O_DIRECT writes to completeTrond Myklebust2015-03-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The O_DIRECT code will grab the inode->i_mutex and flush out buffered writes, before scheduling a read or a write. However there is no equivalent in the buffered write code to wait for O_DIRECT to complete. Fixes a reported issue in xfstests generic/133, when first performing an O_DIRECT write followed by a buffered write. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | NFSv4: nfs4_open_recover_helper() must set share accessTrond Myklebust2015-02-271-0/+3
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The share access mode is now specified as an argument in the nfs4_opendata, and so nfs4_open_recover_helper() needs to call nfs4_map_atomic_open_share() in order to set it. Fixes: 6ae373394c42 ("NFSv4.1: Ask for no delegation on OPEN if using O_DIRECT") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | | Merge tag 'ecryptfs-4.0-rc3-fixes' of ↵Linus Torvalds2015-03-044-8/+34
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Fixes for proper ioctl handling and an untriggerable buffer overflow - The eCryptfs ioctl handling functions should only pass known-good ioctl commands to the lower filesystem - A static checker found a potential buffer overflow. Upon inspection, it is not triggerable due to input validation performed on the mount parameters" * tag 'ecryptfs-4.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: don't pass fs-specific ioctl commands through eCryptfs: ensure copy to crypt_stat->cipher does not overrun
| * | | eCryptfs: don't pass fs-specific ioctl commands throughTyler Hicks2015-03-031-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | eCryptfs can't be aware of what to expect when after passing an arbitrary ioctl command through to the lower filesystem. The ioctl command may trigger an action in the lower filesystem that is incompatible with eCryptfs. One specific example is when one attempts to use the Btrfs clone ioctl command when the source file is in the Btrfs filesystem that eCryptfs is mounted on top of and the destination fd is from a new file created in the eCryptfs mount. The ioctl syscall incorrectly returns success because the command is passed down to Btrfs which thinks that it was able to do the clone operation. However, the result is an empty eCryptfs file. This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl commands through and then copies up the inode metadata from the lower inode to the eCryptfs inode to catch any changes made to the lower inode's metadata. Those five ioctl commands are mostly common across all filesystems but the whitelist may need to be further pruned in the future. https://bugzilla.kernel.org/show_bug.cgi?id=93691 https://launchpad.net/bugs/1305335 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: Rocko <rockorequin@hotmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: stable@vger.kernel.org # v2.6.36+: c43f7b8 eCryptfs: Handle ioctl calls with unlocked and compat functions
| * | | eCryptfs: ensure copy to crypt_stat->cipher does not overrunColin Ian King2015-02-243-4/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch 237fead61998: "[PATCH] ecryptfs: fs/Makefile and fs/Kconfig" from Oct 4, 2006, leads to the following static checker warning: fs/ecryptfs/crypto.c:846 ecryptfs_new_file_context() error: off-by-one overflow 'crypt_stat->cipher' size 32. rl = '0-32' There is a mismatch between the size of ecryptfs_crypt_stat.cipher and ecryptfs_mount_crypt_stat.global_default_cipher_name causing the copy of the cipher name to cause a off-by-one string copy error. This fix ensures the space reserved for this string is the same size including the trailing zero at the end throughout ecryptfs. This fix avoids increasing the size of ecryptfs_crypt_stat.cipher and also ecryptfs_parse_tag_70_packet_silly_stack.cipher_string and instead reduces the of ECRYPTFS_MAX_CIPHER_NAME_SIZE to 31 and includes the + 1 for the end of string terminator. NOTE: An overflow is not possible in practice since the value copied into global_default_cipher_name is validated by ecryptfs_code_for_cipher_string() at mount time. None of the allowed cipher strings are long enough to cause the potential buffer overflow fixed by this patch. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> [tyhicks: Added the NOTE about the overflow not being triggerable] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
* | | Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2015-03-031-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd fixes from Bruce Fields: "Three miscellaneous bugfixes, most importantly the clp->cl_revoked bug, which we've seen several reports of people hitting" * 'for-4.0' of git://linux-nfs.org/~bfields/linux: sunrpc: integer underflow in rsc_parse() nfsd: fix clp->cl_revoked list deletion causing softlock in nfsd svcrpc: fix memory leak in gssp_accept_sec_context_upcall
| * | | nfsd: fix clp->cl_revoked list deletion causing softlock in nfsdAndrew Elble2015-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") removed the use of the reaplist to clean out clp->cl_revoked. It failed to change list_entry() to walk clp->cl_revoked.next instead of reaplist.next Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") Cc: stable@vger.kernel.org Reported-by: Eric Meddaugh <etmsys@rit.edu> Tested-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | Merge tag 'xfs-for-linus-4.0-rc2' of ↵Linus Torvalds2015-02-286-31/+41
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "These are fixes for regressions/bugs introduced in the 4.0 merge cycle and problems discovered during the merge window that need to be pushed back to stable kernels ASAP. This contains: - ensure quota type is reset in on-disk dquots - fix missing partial EOF block data flush on truncate extension - fix transaction leak in error handling for new pnfs block layout support - add missing target_ip check to RENAME_EXCHANGE" * tag 'xfs-for-linus-4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: cancel failed transaction in xfs_fs_commit_blocks() xfs: Ensure we have target_ip for RENAME_EXCHANGE xfs: ensure truncate forces zeroed blocks to disk xfs: Fix quota type in quota structures when reusing quota file
| * | | | xfs: cancel failed transaction in xfs_fs_commit_blocks()Eric Sandeen2015-02-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If xfs_trans_reserve fails we don't cancel the transaction, and we'll leak the allocated transaction pointer. Spotted by Coverity. Signed-off-by: Eric Sandeen <ssandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | xfs: Ensure we have target_ip for RENAME_EXCHANGEEric Sandeen2015-02-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We shouldn't get here with RENAME_EXCHANGE set and no target_ip, but let's be defensive, because xfs_cross_rename() will dereference it. Spotted by Coverity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | xfs: ensure truncate forces zeroed blocks to diskDave Chinner2015-02-233-30/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new fsync vs power fail test in xfstests indicated that XFS can have unreliable data consistency when doing extending truncates that require block zeroing. The blocks beyond EOF get zeroed in memory, but we never force those changes to disk before we run the transaction that extends the file size and exposes those blocks to userspace. This can result in the blocks not being correctly zeroed after a crash. Because in-memory behaviour is correct, tools like fsx don't pick up any coherency problems - it's not until the filesystem is shutdown or the system crashes after writing the truncate transaction to the journal but before the zeroed data in the page cache is flushed that the issue is exposed. Fix this by also flushing the dirty data in memory region between the old size and new size when we've found blocks that need zeroing in the truncate process. Reported-by: Liu Bo <bo.li.liu@oracle.com> cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | xfs: Fix quota type in quota structures when reusing quota fileJan Kara2015-02-231-0/+5
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For filesystems without separate project quota inode field in the superblock we just reuse project quota file for group quotas (and vice versa) if project quota file is allocated and we need group quota file. When we reuse the file, quota structures on disk suddenly have wrong type stored in d_flags though. Nobody really cares about this (although structure type reported to userspace was wrong as well) except that after commit 14bf61ffe6ac (quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units) assertion in xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I was testing without XFS_DEBUG so I didn't notice when submitting the above commit). Fix the problem by properly resetting ddq->d_flags when running quotacheck for a quota file. CC: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | | | nilfs2: fix potential memory overrun on inodeRyusuke Konishi2015-02-281-3/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each inode of nilfs2 stores a root node of a b-tree, and it turned out to have a memory overrun issue: Each b-tree node of nilfs2 stores a set of key-value pairs and the number of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as a few other "bn_*" members. Since the value of "bn_nchildren" is used for operations on the key-values within the b-tree node, it can cause memory access overrun if a large number is incorrectly set to "bn_nchildren". For instance, nilfs_btree_node_lookup() function determines the range of binary search with it, and too large "bn_nchildren" leads nilfs_btree_node_get_key() in that function to overrun. As for intermediate b-tree nodes, this is prevented by a sanity check performed when each node is read from a drive, however, no sanity check has been done for root nodes stored in inodes. This patch fixes the issue by adding missing sanity check against b-tree root nodes so that it's called when on-memory inodes are read from ifile, inode metadata file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2015-02-261-1/+8
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "I'm still testing more fixes, but I wanted to get out the fix for the btrfs raid5/6 memory corruption I mentioned in my merge window pull" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix allocation size calculations in alloc_btrfs_bio
| * | Btrfs: fix allocation size calculations in alloc_btrfs_bioChris Mason2015-02-201-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8e5cfb55d3f (Btrfs: Make raid_map array be inlined in btrfs_bio structure), the raid map array is allocated along with the btrfs bio in alloc_btrfs_bio. The calculation used to decide how much we need to allocate was using the wrong parameter passed into the allocation function. The passed in real_stripes will be zero if a target replace operation is not currently running. We want to use total_stripes instead. Signed-off-by: Chris Mason <clm@fb.com> Reported-by: David Sterba <dsterba@suse.cz> Tested-by: David Sterba <dsterba@suse.cz>
* | | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2015-02-225-56/+108
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Ext4 bug fixes. We also reserved code points for encryption and read-only images (for which the implementation is mostly just the reserved code point for a read-only feature :-)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix indirect punch hole corruption ext4: ignore journal checksum on remount; don't fail ext4: remove duplicate remount check for JOURNAL_CHECKSUM change ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize ext4: support read-only images ext4: change to use setup_timer() instead of init_timer() ext4: reserve codepoints used by the ext4 encryption feature jbd2: complain about descriptor block checksum errors
| * | | ext4: fix indirect punch hole corruptionOmar Sandoval2015-02-141-34/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect mapping. However, there are bugs in several corner cases. This fixes 5 distinct bugs: 1. When there is at least one entire level of indirection between the start and end of the punch range and the end of the punch range is the first block of its level, we can't return early; we have to free the intervening levels. 2. When the end is at a higher level of indirection than the start and ext4_find_shared returns a top branch for the end, we still need to free the rest of the shared branch it returns; we can't decrement partial2. 3. When a punch happens within one level of indirection, we need to converge on an indirect block that contains the start and end. However, because the branches returned from ext4_find_shared do not necessarily start at the same level (e.g., the partial2 chain will be shallower if the last block occurs at the beginning of an indirect group), the walk of the two chains can end up "missing" each other and freeing a bunch of extra blocks in the process. This mismatch can be handled by first making sure that the chains are at the same level, then walking them together until they converge. 4. When the punch happens within one level of indirection and ext4_find_shared returns a top branch for the start, we must free it, but only if the end does not occur within that branch. 5. When the punch happens within one level of indirection and ext4_find_shared returns a top branch for the end, then we shouldn't free the block referenced by the end of the returned chain (this mirrors the different levels case). Signed-off-by: Omar Sandoval <osandov@osandov.com>
| * | | ext4: ignore journal checksum on remount; don't failEric Sandeen2015-02-121-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of v3.18, ext4 started rejecting a remount which changes the journal_checksum option. Prior to that, it was simply ignored; the problem here is that if someone has this in their fstab for the root fs, now the box fails to boot properly, because remount of root with the new options will fail, and the box proceeds with a readonly root. I think it is a little nicer behavior to accept the option, but warn that it's being ignored, rather than failing the mount, but that might be a subjective matter... Reported-by: Cónräd <conradsand.arma@gmail.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: remove duplicate remount check for JOURNAL_CHECKSUM changeEric Sandeen2015-02-121-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rejection of, changing journal_checksum during remount. One suffices. While we're at it, remove old comment about the "check" option which has been deprecated for some time now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesizeXiaoguang Wang2015-02-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 90a8020 and d6320cb, Jan Kara has fixed this issue partially. This mmap data corruption still exists in nodelalloc mode, fix this. Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | | ext4: support read-only imagesDarrick J. Wong2015-02-122-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a rocompat feature, "readonly" to mark a FS image as read-only. The feature prevents the kernel and e2fsprogs from changing the image; the flag can be toggled by tune2fs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: change to use setup_timer() instead of init_timer()Jan Mrazek2015-01-261-3/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jan Mrazek <email@honzamrazek.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: reserve codepoints used by the ext4 encryption featureTheodore Ts'o2015-01-191-4/+13
| | | | | | | | | | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | jbd2: complain about descriptor block checksum errorsDarrick J. Wong2015-01-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should complain in dmesg when journal recovery fails on account of the descriptor block being corrupt, so that the diagnostic data can be recovered. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | | Merge branch 'for-linus-2' of ↵Linus Torvalds2015-02-2252-676/+715
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...
| * | | | autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocationAl Viro2015-02-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X-Coverup: just ask spender Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | procfs: fix race between symlink removals and traversalsAl Viro2015-02-223-12/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use_pde()/unuse_pde() in ->follow_link()/->put_link() resp. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | debugfs: leave freeing a symlink body until inode evictionAl Viro2015-02-221-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it is, we have debugfs_remove() racing with symlink traversals. Supply ->evict_inode() and do freeing there - inode will remain pinned until we are done with the symlink body. And rip the idiocy with checking if dentry is positive right after we'd verified debugfs_positive(), which is a stronger check... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | trylock_super(): replacement for grab_super_passive()Konstantin Khlebnikov2015-02-223-26/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed significant locking contention in memory reclaimer around sb_lock inside grab_super_passive(). Grab_super_passive() is called from two places: in icache/dcache shrinkers (function super_cache_scan) and from writeback (function __writeback_inodes_wb). Both are required for progress in memory allocator. Grab_super_passive() acquires sb_lock to increment sb->s_count and check sb->s_instances. It seems sb->s_umount locked for read is enough here: super-block deactivation always runs under sb->s_umount locked for write. Protecting super-block itself isn't a problem: in super_cache_scan() sb is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers are still active. Inside writeback super-block comes from inode from bdi writeback list under wb->list_lock. This patch removes locking sb_lock and checks s_instances under s_umount: generic_shutdown_super() unlinks it under sb->s_umount locked for write. New variant is called trylock_super() and since it only locks semaphore, callers must call up_read(&sb->s_umount) instead of drop_super(sb) when they're done. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells2015-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup() rather than d_is_dir() when checking a dir watch and give an error on fake directories. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells2015-02-224-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack thereof) in cachefiles: (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as it doesn't want to deal with automounts in its cache. (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in cachefiles. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)David Howells2015-02-2230-65/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | VFS: Split DCACHE_FILE_TYPE into regular and special typesDavid Howells2015-02-221-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and socket files). d_is_reg() and d_is_special() are added to detect these subtypes and d_is_file() is left as the union of the two. This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to use d_is_reg(dentry) instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | VFS: Add a fallthrough flag for marking virtual dentriesDavid Howells2015-02-221-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is a virtual dentry that covers another one in a lower layer that should be used instead. This may be recorded on medium if directory integration is stored there. The flag can be set with d_set_fallthru() and tested with d_is_fallthru(). Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | Merge branch 'overlayfs-next' of ↵Al Viro2015-02-206-319/+489
| |\ \ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-next
| | * | | | ovl: discard independent cursor in readdir()hujianyang2015-01-091-24/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the ovl_dir_cache is stable during a directory reading, the cursor of struct ovl_dir_file don't need to be an independent entry in the list of a merged directory. This patch changes *cursor* to a pointer which points to the entry in the ovl_dir_cache. After this, we don't need to check *is_cursor* either. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
OpenPOWER on IntegriCloud