| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.
In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details. For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.
NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian. Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.
(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it. And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)
The <asm/word-at-a-time.h> functions are as follows:
- WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
uses.
- has_zero(): take a word, and determine if it has a zero byte in it.
It gets the word, the pointer to the constant pool, and a pointer to
an intermediate "data" field it can set.
This is the "quick-and-dirty" zero tester: it's what is run inside
the hot loops.
- "prep_zero_mask()": take the word, the data that has_zero() produced,
and the constant pool, and generate an *exact* mask of which byte had
the first zero. This is run directly *outside* the loop, and allows
the "has_zero()" function to answer the "is there a zero byte"
question without necessarily getting exactly *which* byte is the
first one to contain a zero.
If you do multiple byte lookups concurrently (eg "hash_name()", which
looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
phase, the result of those can be or'ed together to get the "either
or" case.
- The result from "prep_zero_mask()" can then be fed into "find_zero()"
(to find the byte offset of the first byte that was zero) or into
"zero_bytemask()" (to find the bytemask of the bytes preceding the
zero byte).
The existence of zero_bytemask() is optional, and is not necessary
for the normal string routines. But dentry name hashing needs it, so
if you enable DENTRY_WORD_AT_A_TIME you need to expose it.
This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces. This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
"Interesting bits are:
- removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
quota code does not need i_mutex anymore in any unusual way.
- backport (from ext4) of a fix of a checkpointing bug (missing cache
flush) that could lead to fs corruption on power failure
The rest are just random small fixes & cleanups."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: trivial fix to comment for ext2_free_blocks
ext2: remove the redundant comment for ext2_export_ops
ext3: return 32/64-bit dir name hash according to usage type
quota: Get rid of nested I_MUTEX_QUOTA locking subclass
quota: Use precomputed value of sb_dqopt in dquot_quota_sync
ext2: Remove i_mutex use from ext2_quota_write()
reiserfs: Remove i_mutex use from reiserfs_quota_write()
ext4: Remove i_mutex use from ext4_quota_write()
ext3: Remove i_mutex use from ext3_quota_write()
quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
jbd: Write journal superblock with WRITE_FUA after checkpointing
jbd: protect all log tail updates with j_checkpoint_mutex
jbd: Split updating of journal superblock and marking journal empty
ext2: do not register write_super within VFS
ext2: Remove s_dirt handling
ext2: write superblock only once on unmount
ext3: update documentation with barrier=1 default
ext3: remove max_debt in find_group_orlov()
jbd: Refine commit writeout logic
|
| |
| |
| |
| |
| |
| |
| | |
The function is ext2_free_blocks(), not ext2_free_blocks_sb().
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The comment is outdated and isn't particularly informative anyway - NULL
meaning the default behavior is very common in kernel. And we really set about
half of entries. So remove the whole comment for ext2_export_ops.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is based on commit d1f5273e9adb40724a85272f248f210dc4ce919a
ext4: return 32/64-bit dir name hash according to usage type
by Fan Yong <yong.fan@whamcloud.com>
Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir(). However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.
Allow ext3 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.
This patch does implement a new ext3_dir_llseek op, because with 64-bit
hashes, nfs will attempt to seek to a hash "offset" which is much
larger than ext3's s_maxbytes. So for dx dirs, we call
generic_file_llseek_size() with the appropriate max hash value as the
maximum seekable size. Otherwise we just pass through to
generic_file_llseek().
Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Patch-updated-by: Eric Sandeen <sandeen@redhat.com>
(blame us if something is not correct)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So far i_mutex was ranking above dqonoff_mutex and i_mutex on quota files
was special and ranking below dqonoff_mutex (and several other locks).
However there's no real need for i_mutex on quota files to be special.
IO on quota files is serialized by dqio_mutex anyway so we don't need to
take i_mutex when writing to quota files. Other places where we take i_mutex
on quota file can accomodate standard i_mutex lock ranking, we only need
to change the lock ranking to be dqonoff_mutex > i_mutex which is a matter
of changing documentation because there's no place which would enforce
ordering in the other direction.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| | |
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't need i_mutex in ext2_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't need i_mutex in reiserfs_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't need i_mutex in ext4_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We don't need i_mutex in ext3_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When CONFIG_QUOTA_DEBUG is enabled we call inode_get_rsv_space() from
add_dquot_ref() while holding i_lock. But inode_get_rsv_space() is trying
to get i_lock as well resulting in double lock.
Fix the problem by moving inode_get_rsv_space() call out of i_lock.
Reported-and-analyzed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If journal superblock is written only in disk's caches and other transaction
starts reusing space of the transaction cleaned from the log, it can happen
blocks of a new transaction reach the disk before journal superblock. When
power failure happens in such case, subsequent journal replay would still try
to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in journal_commit_transaction() and journal_flush() can really race
with other log tail updates (e.g. someone doing journal_flush() with someone
running cleanup_journal_tail()). So protect all log tail updates with
j_checkpoint_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Jan Kara removed 'sb->s_dirt' VFS flag references, so we do not need to
register the ext2 'ext2_write_super()' method in the VFS superblock operations,
because 'sb->s_dirt' won't be ever set to 1 and VFS won't ever call
'->write_super()' anyway. Thus, remove the method.
Tested using xfstests.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Places which modify superblock feature / state fields mark the superblock
buffer dirty so it is written out by flusher thread. Thus there's no need to
set s_dirt there.
The only other fields changing in the superblock are the numbers of free
blocks, free inodes and s_wtime. There's no real need to write (or even
compute) these periodically. Free blocks / inodes counters are recomputed on
every mount from group counters anyway and value of s_wtime is only
informational and imprecise anyway. So it should be enough to write these
opportunistically on mount, remount, umount, and sync_fs times.
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently on unmount if we are mounted R/W, we first write the superblock to
the media if it is dirty, and then write it again, which is not optimal. This
patch makes ext2 write the superblock on unmount less times.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| | |
max_debt, involved variables and calculations
are no longer needed, clean them up.
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we write out all journal buffers in WRITE_SYNC mode. This improves
performance for fsync heavy workloads but hinders performance when writes
are mostly asynchronous, most noticably it slows down readers and users
complain about slow desktop response etc.
So submit writes as asynchronous in the normal case and only submit writes as
WRITE_SYNC if we detect someone is waiting for current transaction commit.
I've gathered some numbers to back this change. The first is the read latency
test. It measures time to read 1 MB after several seconds of sleeping in
presence of streaming writes.
Top 10 times (out of 90) in us:
Before After
2131586 697473
1709932 557487
1564598 535642
1480462 347573
1478579 323153
1408496 222181
1388960 181273
1329565 181070
1252486 172832
1223265 172278
Average:
619377 82180
So the improvement in both maximum and average latency is massive.
I've measured fsync throughput by:
fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4
in presence of streaming reader. The numbers (fsyncs/s) are:
Before After
9.9 6.3
6.8 6.0
6.3 6.2
5.8 6.1
So fsync performance seems unharmed by this change.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector
support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial
domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of
reusing the existing one."
Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.
* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not map the same GSI twice in PVHVM guests.
hvc_xen: NULL dereference on allocation failure
xen: Add selfballoning memory reservation tunable.
xenbus: Add support for xenbus backend in stub domain
xen/smp: unbind irqworkX when unplugging vCPUs.
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/gnttab: add deferred freeing logic
debugfs: Add support to print u32 array in debugfs
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move the code from Xen to debugfs to make the code common
for other users as well.
Accked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
[v1: Fixed rebase issues]
[v2: Fixed PPC compile issues]
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull sparc changes from David S. Miller:
"This has the generic strncpy_from_user() implementation architectures
can now use, which we've been developing on linux-arch over the past
few days.
For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
and the latter of which pointed out an adjustment I needed to make to
sparc's user_addr_max() definition. Linus, you were right, STACK_TOP
was not the right thing to use, even on sparc itself :-)
From Sam Ravnborg, we have a conversion of sparc32 over to the common
alloc_thread_info_node(), since the aspect which originally blocked
our doing so (sun4c) has been removed."
Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix user_addr_max() definition.
lib: Sparc's strncpy_from_user is generic enough, move under lib/
kernel: Move REPEAT_BYTE definition into linux/kernel.h
sparc: Increase portability of strncpy_from_user() implementation.
sparc: Optimize strncpy_from_user() zero byte search.
sparc: Add full proper error handling to strncpy_from_user().
sparc32: use the common implementation of alloc_thread_info_node()
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
And make sure that everything using it explicitly includes
that header file.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull XFS update from Ben Myers:
- Removal of xfsbufd
- Background CIL flushes have been moved to a workqueue.
- Fix to xfs_check_page_type applicable to filesystems where
blocksize < page size
- Fix for stale data exposure when extsize hints are used.
- A series of xfs_buf cache cleanups.
- Fix for XFS_IOC_ALLOCSP
- Cleanups for includes and removal of xfs_lrw.[ch].
- Moved all busy extent handling to it's own file so that it is easier
to merge with userspace.
- Fix for log mount failure.
- Fix to enable inode reclaim during quotacheck at mount time.
- Fix for delalloc quota accounting.
- Fix for memory reclaim deadlock on agi buffer.
- Fixes for failed writes and to clean up stale delalloc blocks.
- Fix to use GFP_NOFS in blkdev_issue_flush
- SEEK_DATA/SEEK_HOLE support
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits)
xfs: add trace points for log forces
xfs: fix memory reclaim deadlock on agi buffer
xfs: fix delalloc quota accounting on failure
xfs: protect xfs_sync_worker with s_umount semaphore
xfs: introduce SEEK_DATA/SEEK_HOLE support
xfs: make xfs_extent_busy_trim not static
xfs: make XBF_MAPPED the default behaviour
xfs: flush outstanding buffers on log mount failure
xfs: Properly exclude IO type flags from buffer flags
xfs: clean up xfs_bit.h includes
xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
xfs: move xfs_fsb_to_db to xfs_bmap.h
xfs: clean up busy extent naming
xfs: move busy extent handling to it's own file
xfs: move xfsagino_t to xfs_types.h
xfs: use iolock on XFS_IOC_ALLOCSP calls
xfs: kill XBF_DONTBLOCK
xfs: kill xfs_read_buf()
xfs: kill XBF_LOCK
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To enable easy tracing of the location of log forces and the
frequency of them via perf, add a pair of trace points to the log
force functions. This will help debug where excessive log forces
are being issued from by simple perf commands like:
# ~/perf/perf top -e xfs:xfs_log_force -G -U
Which gives this sort of output:
Events: 141 xfs:xfs_log_force
- 100.00% [kernel] [k] xfs_log_force
- xfs_log_force
87.04% xfsaild
kthread
kernel_thread_helper
- 12.87% xfs_buf_lock
_xfs_buf_find
xfs_buf_get
xfs_trans_get_buf
xfs_da_do_buf
xfs_da_get_buf
xfs_dir2_data_init
xfs_dir2_leaf_addname
xfs_dir_createname
xfs_create
xfs_vn_mknod
xfs_vn_create
vfs_create
do_last.isra.41
path_openat
do_filp_open
do_sys_open
sys_open
system_call_fastpath
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sig.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Note xfs_iget can be called while holding a locked agi buffer. If
it goes into memory reclaim then inode teardown may try to lock the
same buffer. Prevent the deadlock by calling radix_tree_preload
with GFP_NOFS.
Signed-off-by: Peter Watkins <treestem@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfstest 270 was causing quota reservations way beyond what was sane
(ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
in the error handling path of xfs_bmapi_reserve_delalloc() because
xfs_trans_unreserve_quota_nblks() simple negates the value passed -
which doesn't work for an unsigned variable. This causes
reservations of close to 2^32 block instead of removing a
reservation of a handful of blocks.
Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
callers where unsigned integer variables are used, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing
work during mount and unmount. This flag can be cleared by unmount
after the xfs_sync_worker checks it but before the work is completed.
The has caused crashes in the completion handler for the dummy
transaction commited by xfs_sync_worker:
PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0"
#0 [ffff88016fdff930] machine_kexec at ffffffff810244e9
#1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053
#2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8
#3 [ffff88016fdffaa0] no_context at ffffffff8102bd48
#4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d
#5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e
#6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee
#7 [ffff88016fdffc60] page_fault at ffffffff813ac635
[exception RIP: xlog_get_lowest_lsn+0x30]
RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246
RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980
RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400
R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs]
#9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs]
Protect xfs_sync_worker by using the s_umount semaphore at the read
level to provide exclusion with unmount while work is progressing.
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit e459df5, 'xfs: move busy extent handling to it's own file'
moved some code from xfs_alloc.c into xfs_extent_busy.c for
convenience in userspace code merges. One of the functions moved is
xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined
STATIC. Unfortunately this function is still used in xfs_alloc.c, and
this results in an undefined symbol in xfs.ko.
Make xfs_extent_busy_trim not static and add its prototype to
xfs_extent_busy.h.
Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Rather than specifying XBF_MAPPED for almost all buffers, introduce
XBF_UNMAPPED for the couple of users that use unmapped buffers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we fail to mount the log in xfs_mountfs(), we tear down all the
infrastructure we have already allocated. However, the process of
mounting the log may have progressed to the point of reading,
caching and modifying buffers in memory. Hence before we can free
all the infrastructure, we have to flush and remove all the buffers
from memory.
Problem first reported by Eric Sandeen, later a different incarnation
was reported by Ben Myers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Recent event tracing during a debugging session showed that flags
that define the IO type for a buffer are leaking into the flags on
the buffer incorrectly. Fix the flag exclusion mask in
xfs_buf_alloc() to avoid problems that may be caused by such
leakage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.
Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no
need to keep it in it's own file anymore, so move it to xfs_fsops.c
next to xfs_fs_goingdown() and kill xfs_rw.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The only thing left in xfs_rw.h is a function prototype for an inode
function. Move that to xfs_inode.h, and kill xfs_rw.h.
Also move the function implementing the prototype from xfs_rw.c to
xfs_inode.c so we only have one function left in xfs_rw.c
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is the only remaining useful function in xfs_rw.h, so move it
to a header file responsible for block mapping functions that the
callers already include. Soon we can get rid of xfs_rw.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that the busy extent tracking has been moved out of the
allocation files, clean up the namespace it uses to
"xfs_extent_busy" rather than a mix of "xfs_busy" and
"xfs_alloc_busy".
Signed-off-by: Dave Chinner<dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To make it easier to handle userspace code merges, move all the busy
extent handling out of the allocation code and into it's own file.
The userspace code does not need the busy extent code, so this
simplifies the merging of the kernel code into the userspace
xfsprogs library.
Because the busy extent code has been almost completely rewritten
over the past couple of years, also update the copyright on this new
file to include the authors that made all those changes.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fsstress has a particular effective way of stopping debug XFS
kernels. We keep seeing assert failures due finding delayed
allocation extents where there should be none. This shows up when
extracting extent maps and we are holding all the locks we should be
to prevent races, so this really makes no sense to see these errors.
After checking that fsstress does not use mmap, it occurred to me
that fsstress uses something that no sane application uses - the
XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
do allocation of blocks beyond EOF without using preallocation, and
then call setattr to extend and zero the allocated blocks.
THe problem here is this is a buffered write, and hence the
allocation is a delayed allocation. Unlike the buffered IO path, the
allocation and zeroing are not serialised using the IOLOCK. Hence
the ALLOCSP operation can race with operations holding the iolock to
prevent buffered IO operations from occurring.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
avoid recursion through memory reclaim back into the filesystem.
All the blocking get calls in growfs occur inside a transaction, even though
they are no part of the transaction, so all allocation will be GFP_NOFS due to
the task flag PF_TRANS being set. The blocking read calls occur during log
recovery, so they will probably be unaffected by converting to GFP_NOFS
allocations.
Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
outside a transaction context. The error handling is slightly different in that
xfs_read_buf stales the errored buffer it gets back, but there is probably good
reason for xfs_trans_read_buf() for doing this.
Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
and convert all the callers of xfs_read_buf() to use the former function. We can
then remove xfs_read_buf().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Buffers are always returned locked from the lookup routines. Hence
we don't need to tell the lookup routines to return locked buffers,
on to try and lock them. Remove XBF_LOCK from all the callers and
from internal buffer cache usage.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
xfs_buf_btoc and friends are simple macros that do basic block
to page index conversion and vice versa. These aren't widely used,
and we use open coded masking and shifting everywhere else. Hence
remove the macros and open code the work they do.
Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
incorrect - we are using pages directly and not the page cache, so
use PAGE_{SIZE|MASK|SHIFT} instead.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that we pass block counts everywhere, and index buffers by block
number and length in units of blocks, convert the desired IO size
into block counts rather than bytes. Convert the code to use block
counts, and those that need byte counts get converted at the time of
use.
Rename the b_desired_count variable to something closer to it's
purpose - b_io_length - as it is only used to specify the length of
an IO for a subset of the buffer. The only time this is used is for
log IO - both writing iclogs and during log recovery. In all other
cases, the b_io_length matches b_length, and hence a lot of code
confuses the two. e.g. the buf item code uses the io count
exclusively when it should be using the buffer length. Fix these
apprpriately as they are found.
Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
around the desired IO length. They only serve to make the code
shouty loud, don't actually add any real value, and are often used
incorrectly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that we pass block counts everywhere, and index buffers by block
number, track the length of the buffer in units of blocks rather
than bytes. Convert the code to use block counts, and those that
need byte counts get converted at the time of use.
Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
around the buffer length. They only serve to make the code shouty
loud and don't actually add any real value.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Seeing as we pass block numbers around everywhere in the buffer
cache now, it makes no sense to index everything by byte offset.
Replace all the byte offset indexing with block number based
indexing, and replace all uses of the byte offset with direct
conversion from the block index.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The xfs_buf_get/read API is not consistent in the units it uses, and
does not use appropriate or consistent units/types for the
variables.
Convert the API to use disk addresses and block counts for all
buffer get and read calls. Use consistent naming for all the
functions and their declarations, and convert the internal functions
to use disk addresses and block counts to avoid need to convert them
from one type to another and back again.
Fix all the callers to use disk addresses and block counts. In many
cases, this removes an additional conversion from the function call
as the callers already have a block count.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|