| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Centralize the checks for inodes_per_block and be more strict to make
sure the inodes_per_block_group can't end up being zero.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix a large number of problems with how we handle mount options in the
superblock. For one, if the string in the superblock is long enough
that it is not null terminated, we could run off the end of the string
and try to interpret superblocks fields as characters. It's unlikely
this will cause a security problem, but it could result in an invalid
parse. Also, parse_options is destructive to the string, so in some
cases if there is a comma-separated string, it would be modified in
the superblock. (Fortunately it only happens on file systems with a
1k block size.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the block size or cluster size is insane, reject the mount. This
is important for security reasons (although we shouldn't be just
depending on this check).
Ref: http://www.securityfocus.com/archive/1/539661
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe.
current_time() will be transitioned to be y2038 safe
along with vfs.
current_time() returns timestamps according to the
granularities set in the super_block.
The granularity check in ext4_current_time() to call
current_time() or CURRENT_TIME_SEC is not required.
Use current_time() directly to obtain timestamps
unconditionally, and remove ext4_current_time().
Quota files are assumed to be on the same filesystem.
Hence, use current_time() for these files as well.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If there is an error reported in mballoc via ext4_grp_locked_error(),
the code is holding a spinlock, so ext4_commit_super() must not try to
lock the buffer head, or else it will trigger a BUG:
BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358
in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount
CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166
ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153
ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0
Call Trace:
[<ffffffff81318c89>] dump_stack+0x67/0x9e
[<ffffffff810810b0>] ___might_sleep+0xf0/0x140
[<ffffffff81081153>] __might_sleep+0x53/0xb0
[<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290
[<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230
[<ffffffff81081153>] ? __might_sleep+0x53/0xb0
[<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320
Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0
(and it is the only caller which does so), avoid locking and unlocking
the buffer in this case.
This can result in races with ext4_commit_super() if there are other
problems (which is what commit 4743f83990614 was trying to address),
but a Warning is better than BUG.
Fixes: 4743f83990614
Cc: stable@vger.kernel.org # 4.9
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This allows us to properly propagate errors back up to
ext4_truncate()'s callers. This also means we no longer have to
silently ignore some errors (e.g., when trying to add the inode to the
orphan inode list).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull block layer updates from Jens Axboe:
"This is the main block pull request this series. Contrary to previous
release, I've kept the core and driver changes in the same branch. We
always ended up having dependencies between the two for obvious
reasons, so makes more sense to keep them together. That said, I'll
probably try and keep more topical branches going forward, especially
for cycles that end up being as busy as this one.
The major parts of this pull request is:
- Improved support for O_DIRECT on block devices, with a small
private implementation instead of using the pig that is
fs/direct-io.c. From Christoph.
- Request completion tracking in a scalable fashion. This is utilized
by two components in this pull, the new hybrid polling and the
writeback queue throttling code.
- Improved support for polling with O_DIRECT, adding a hybrid mode
that combines pure polling with an initial sleep. From me.
- Support for automatic throttling of writeback queues on the block
side. This uses feedback from the device completion latencies to
scale the queue on the block side up or down. From me.
- Support from SMR drives in the block layer and for SD. From Hannes
and Shaun.
- Multi-connection support for nbd. From Josef.
- Cleanup of request and bio flags, so we have a clear split between
which are bio (or rq) private, and which ones are shared. From
Christoph.
- A set of patches from Bart, that improve how we handle queue
stopping and starting in blk-mq.
- Support for WRITE_ZEROES from Chaitanya.
- Lightnvm updates from Javier/Matias.
- Supoort for FC for the nvme-over-fabrics code. From James Smart.
- A bunch of fixes from a whole slew of people, too many to name
here"
* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
blk-stat: fix a few cases of missing batch flushing
blk-flush: run the queue when inserting blk-mq flush
elevator: make the rqhash helpers exported
blk-mq: abstract out blk_mq_dispatch_rq_list() helper
blk-mq: add blk_mq_start_stopped_hw_queue()
block: improve handling of the magic discard payload
blk-wbt: don't throttle discard or write zeroes
nbd: use dev_err_ratelimited in io path
nbd: reset the setup task for NBD_CLEAR_SOCK
nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
nvme-fabrics: Add target support for FC transport
nvme-fabrics: Add host support for FC transport
nvme-fabrics: Add FC transport LLDD api definitions
nvme-fabrics: Add FC transport FC-NVME definitions
nvme-fabrics: Add FC transport error codes to nvme.h
Add type 0x28 NVME type code to scsi fc headers
nvme-fabrics: patch target code in prep for FC transport support
nvme-fabrics: set sqe.command_id in core not transports
parser: add u64 number parser
nvme-rdma: align to generic ib_event logging helper
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the block size or cluster size is insane, reject the mount. This
is important for security reasons (although we shouldn't be just
depending on this check).
Ref: http://www.securityfocus.com/archive/1/539661
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|/
|
|
|
|
|
|
|
|
|
| |
Recent commit require line continuing printks to use PR_CONT.
Update super.c to use KERN_CONT and use vsprintf extension %pV to
avoid a printk/vprintk/printk("\n") sequence as well.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
|
|
|
|
|
|
|
|
|
| |
When a file system contains an internal journal that has not been
loaded, use the journal inode's i_size field to determine its
contribution to the file system's overhead. (The journal's j_maxlen
field is normally used to determine its size, but it's unavailable when
the journal has not been loaded.)
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
|
|
|
|
| |
Factor out the code used in ext4_get_journal() to read a valid journal
inode from storage, enabling its reuse in other functions.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ext4lazyinit is a global thread. This thread performs itable
initalization under li_list_mtx mutex.
It basically does the following:
ext4_lazyinit_thread
->mutex_lock(&eli->li_list_mtx);
->ext4_run_li_request(elr)
->ext4_init_inode_table-> Do a lot of IO if the list is large
And when new mount/umount arrive they have to block on ->li_list_mtx
because lazy_thread holds it during full walk procedure.
ext4_fill_super
->ext4_register_li_request
->mutex_lock(&ext4_li_info->li_list_mtx);
->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
In my case mount takes 40minutes on server with 36 * 4Tb HDD.
Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
Even more. If one of filesystems was frozen lazyinit_thread will simply
block on sb_start_write() so other mount/umount will be stuck forever.
This patch changes logic like follows:
- grab ->s_umount read sem before processing new li_request.
After that it is safe to drop li_list_mtx because all callers of
li_remove_request are holding ->s_umount for write.
- li_thread skips frozen SB's
Locking order:
Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so
the only way to to grab ->s_mount inside li_thread is via down_read_trylock
xfstests:ext4/023
#PSBM-49658
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
| |
When quota information is stored in quota files, we enable only quota
accounting on mount and enforcement is enabled only in response to
Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a
possibility to enable quota enforcement on mount by specifying
corresponding quota mount option (usrquota, grpquota, prjquota).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix bugs that could cause kernel deadlocks or file system corruption
while moving xattrs to expand the extended inode.
Also add some sanity checks to the block group descriptors to make
sure we don't end up overwriting the superblock"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid deadlock when expanding inode size
ext4: properly align shifted xattrs when expanding inodes
ext4: fix xattr shifting when expanding inodes part 2
ext4: fix xattr shifting when expanding inodes
ext4: validate that metadata blocks do not overlap superblock
ext4: reserve xattr index for the Hurd
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A number of fuzzing failures seem to be caused by allocation bitmaps
or other metadata blocks being pointed at the superblock.
This can cause kernel BUG or WARNings once the superblock is
overwritten, so validate the group descriptor blocks to make sure this
doesn't happen.
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|\ \
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"The major change this cycle is deleting ext4's copy of the file system
encryption code and switching things over to using the copies in
fs/crypto. I've updated the MAINTAINERS file to add an entry for
fs/crypto listing Jaeguk Kim and myself as the maintainers.
There are also a number of bug fixes, most notably for some problems
found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also
fixed is a writeback deadlock detected by generic/130, and some
potential races in the metadata checksum code"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
ext4: verify extent header depth
ext4: short-cut orphan cleanup on error
ext4: fix reference counting bug on block allocation error
MAINTAINRES: fs-crypto maintainers update
ext4 crypto: migrate into vfs's crypto engine
ext2: fix filesystem deadlock while reading corrupted xattr block
ext4: fix project quota accounting without quota limits enabled
ext4: validate s_reserved_gdt_blocks on mount
ext4: remove unused page_idx
ext4: don't call ext4_should_journal_data() on the journal inode
ext4: Fix WARN_ON_ONCE in ext4_commit_super()
ext4: fix deadlock during page writeback
ext4: correct error value of function verifying dx checksum
ext4: avoid modifying checksum fields directly during checksum verification
ext4: check for extents that wrap around
jbd2: make journal y2038 safe
jbd2: track more dependencies on transaction commit
jbd2: move lockdep tracking to journal_s
jbd2: move lockdep instrumentation for jbd2 handles
ext4: respect the nobarrier mount option in nojournal mode
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If there are racing calls to ext4_commit_super() it's possible for
another writeback of the superblock to result in the buffer being
marked with an error after we check if the buffer is marked as having
a write error and the buffer up-to-date flag is set again. If that
happens mark_buffer_dirty() can end up throwing a WARN_ON_ONCE.
Fix this by moving this check to write before we call
write_buffer_dirty(), and keeping the buffer locked during this whole
sequence.
Signed-off-by: Pranay Kr. Srivastava <pranjas@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|/
|
|
|
|
|
|
|
|
|
| |
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.
Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.
Reported-by: Micah Parrish <micah.parrish@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In ext4, there is a race condition between changing inode journal mode
and ext4_writepages(). While ext4_writepages() is executed on a
non-journalled mode inode, the inode's journal mode could be enabled
by ioctl() and then, some pages dirtied after switching the journal
mode will be still exposed to ext4_writepages() in non-journaled mode.
To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan
Kara's suggestion because we don't want to waste ext4_inode_info's
space for this extra rare case.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.
This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: ignore quota mount options if the quota feature is enabled
ext4 crypto: fix some error handling
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: retry block allocation for failed DIO and DAX writes
ext4: add lockdep annotations for i_data_sem
ext4: allow readdir()'s of large empty directories to be interrupted
btrfs: fix crash/invalid memory access on fsync when using overlayfs
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
nfs: use file_dentry()
fs: add file_dentry()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, ext4 would fail the mount if the file system had the quota
feature enabled and quota mount options (used for the older quota
setups) were present. This broke xfstests, since xfs silently ignores
the usrquote and grpquota mount options if they are specified. This
commit changes things so that we are consistent with xfs; having the
mount options specified is harmless, so no sense break users by
forbidding them.
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This should be fixed in the quota layer so we can test with the quota
mutex held, but for now, we need this to avoid tests from crashing the
kernel aborting the regression test suite.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With the internal Quota feature, mke2fs creates empty quota inodes and
quota usage tracking is enabled as soon as the file system is mounted.
Since quotacheck is no longer preallocating all of the blocks in the
quota inode that are likely needed to be written to, we are now seeing
a lockdep false positive caused by needing to allocate a quota block
from inside ext4_map_blocks(), while holding i_data_sem for a data
inode. This results in this complaint:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ei->i_data_sem);
lock(&s->s_dquot.dqio_mutex);
lock(&ei->i_data_sem);
lock(&s->s_dquot.dqio_mutex);
Google-Bug-Id: 27907753
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and quota updates from Jan Kara:
"This contains a rewrite of UDF handling of filename encoding to fix
remaining overflow issues from Andrew Gabbasov and quota changes to
support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Fix possible GPF due to uninitialised pointers
ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes
quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem
quota: Fix possible races during quota loading
ocfs2: Implement get_next_id()
quota_v2: Implement get_next_id() for V2 quota format
quota: Add support for ->get_nextdqblk() for VFS quota
udf: Merge linux specific translation into CS0 conversion function
udf: Remove struct ustr as non-needed intermediate storage
udf: Use separate buffer for copying split names
udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
udf: Join functions for UTF8 and NLS conversions
udf: Parameterize output length in udf_put_filename
quota: Allow Q_GETQUOTA for frozen filesystem
quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We forgot to set .get_nextdqblk operation in quotactl_ops structure used
by ext4 when quota is using hidden inode thus the operation was not
really supported. Fix the omission.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Add infrastructure for supporting get_nextdqblk() callback for VFS
quotas. Translate the operation into a callback to appropriate
filesystem and consequently to quota format callback.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If data_err=abort option is specified for an ext3/ext4 mount,
/proc/mounts does show it as "(null)". This is caused by token2str()
returning NULL for Opt_data_err_abort (due to its pattern containing
'=').
Signed-off-by: Ales Novak <alnovak@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| | |
Remove counter of pending io ends as it is unused.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
However the code cleanups that happened after 2011 when the lock was
introduced made aio_mutex acquired at almost the same places where we
already have exclusion using i_mutex. So just use i_mutex for the
exclusion of unaligned AIO DIO.
The change moves waiting for pending unwritten extent conversion under
i_mutex. That makes special handling of O_APPEND writes unnecessary and
also avoids possible livelocking of unaligned AIO DIO with aligned one
(nothing was preventing contiguous stream of aligned AIO DIOs to let
unaligned AIO DIO wait forever).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This variable, introduced in commit 9c191f70, is unnecessary: it is set
once the module has been initialized correctly, and ext4_fill_super
cannot run unless the module has been initialized correctly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|/
|
|
|
|
|
|
|
|
| |
The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
|
|
|
|
|
|
|
| |
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Some locking and page fault bug fixes from Jan Kara, some ext4
encryption fixes from me, and Li Xi's Project Quota commits"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs: clean up the flags definition in uapi/linux/fs.h
ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
ext4: add project quota support
ext4: adds project ID support
ext4 crypto: simplify interfaces to directory entry insert functions
ext4 crypto: add missing locking for keyring_key access
ext4: use pre-zeroed blocks for DAX page faults
ext4: implement allocation of pre-zeroed blocks
ext4: provide ext4_issue_zeroout()
ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag
ext4: document lock ordering
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.
[ Includes fix from Dan Carpenter to crrect error checking from dqget(). ]
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| | |
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We have enough locks that it's probably worth documenting the lock
ordering rules we have in ext4.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, page faults and hole punching are completely unsynchronized.
This can result in page fault faulting in a page into a range that we
are punching after truncate_pagecache_range() has been called and thus
we can end up with a page mapped to disk blocks that will be shortly
freed. Filesystem corruption will shortly follow. Note that the same
race is avoided for truncate by checking page fault offset against
i_size but there isn't similar mechanism available for punching holes.
Fix the problem by creating new rw semaphore i_mmap_sem in inode and
grab it for writing over truncate, hole punching, and other functions
removing blocks from extent tree and for read over page faults. We
cannot easily use i_data_sem for this since that ranks below transaction
start and we need something ranking above it so that it can be held over
the whole truncate / hole punching operation. Also remove various
workarounds we had in the code to reduce race window when page fault
could have created pages with stale mapping information.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to XFS warn when mounting DAX while it is still considered under
development. Also, aspects of the DAX implementation, for example
synchronization against multiple faults and faults causing block
allocation, depend on the correct implementation in the filesystem. The
maturity of a given DAX implementation is filesystem specific.
Cc: <stable@vger.kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-ext4@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is appeared that we can pass journal related mount options and such options
be shown in /proc/mounts
Example:
#mkfs.ext4 -F /dev/vdb
#tune2fs -O ^has_journal /dev/vdb
#mount /dev/vdb /mnt/ -ocommit=20,journal_async_commit
#cat /proc/mounts | grep /mnt
/dev/vdb /mnt ext4 rw,relatime,journal_checksum,journal_async_commit,commit=20,data=ordered 0 0
But options:"journal_checksum,journal_async_commit,commit=20,data=ordered" has
nothing with reality because there is no journal at all.
This patch disallow following options for journalless configurations:
- journal_checksum
- journal_async_commit
- commit=%ld
- data={writeback,ordered,journal}
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Currently MOPT_EXPLICIT treated as EXPLICIT_DELALLOC which may be changed
in future. Let's fix it now.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|