| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The offset passed into xfs_free_file_space() needs to be rounded
down to a certain size, but the rounding mask is built by a 32 bit
variable. Hence the mask will always mask off the upper 32 bits of
the offset and lead to incorrect writeback and invalidation ranges.
This is not actually exposed as a bug because we writeback and
invalidate from the rounded offset to the end of the file, and hence
the offset we are actually punching a hole out of will always be
covered by the code. This needs fixing, however, if we ever want to
use exact ranges for writeback/invalidation here...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 28ca489c63e9aceed8801d2f82d731b3c9aa50f5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FSX on 512 byte block size filesystems has been failing for some
time with corrupted data. The fault dates back to the change in
the writeback data integrity algorithm that uses a mark-and-sweep
approach to avoid data writeback livelocks.
Unfortunately, a side effect of this mark-and-sweep approach is that
each page will only be written once for a data integrity sync, and
there is a condition in writeback in XFS where a page may require
two writeback attempts to be fully written. As a result of the high
level change, we now only get a partial page writeback during the
integrity sync because the first pass through writeback clears the
mark left on the page index to tell writeback that the page needs
writeback....
The cause is writing a partial page in the clustering code. This can
happen when a mapping boundary falls in the middle of a page - we
end up writing back the first part of the page that the mapping
covers, but then never revisit the page to have the remainder mapped
and written.
The fix is simple - if the mapping boundary falls inside a page,
then simple abort clustering without touching the page. This means
that the next ->writepage entry that write_cache_pages() will make
is the page we aborted on, and xfs_vm_writepage() will map all
sections of the page correctly. This behaviour is also optimal for
non-data integrity writes, as it results in contiguous sequential
writeback of the file rather than missing small holes and having to
write them a "random" writes in a future pass.
With this fix, all the fsx tests in xfstests now pass on a 512 byte
block size filesystem on a 4k page machine.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 49b137cbbcc836ef231866c137d24f42c42bb483)
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.
Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Jiri reported a regression in auditing of open(..., O_CREAT) syscalls.
In older kernels, creating a file with open(..., O_CREAT) created
audit_name records that looked like this:
type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
...in recent kernels though, they look like this:
type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
Richard bisected to determine that the problems started with commit
bfcec708, but the log messages have changed with some later
audit-related patches.
The problem is that this audit_inode call is passing in the parent of
the dentry being opened, but audit_inode is being called with the parent
flag false. This causes later audit_inode and audit_inode_child calls to
match the wrong entry in the audit_names list.
This patch simply sets the flag to properly indicate that this inode
represents the parent. With this, the audit_names entries are back to
looking like they did before.
Cc: <stable@vger.kernel.org> # v3.7+
Reported-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Test By: Richard Guy Briggs <rbriggs@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Toralf reported the following oops to the linux-nfs mailing list:
-----------------[snip]------------------
NFSD: unable to generate recoverydir name (-2).
NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
BUG: unable to handle kernel NULL pointer dereference at 000003c8
IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
*pdpt = 000000002ba33001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65
EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0
EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
Stack:
fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
Call Trace:
[<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd]
[<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
[<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
[<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
[<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd]
[<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd]
[<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
[<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
[<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
[<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd]
[<f857b85d>] svc_process+0x3dd/0x640 [sunrpc]
[<f908165d>] nfsd+0xad/0x110 [nfsd]
[<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
[<c1054824>] kthread+0x94/0xa0
[<c1486937>] ret_from_kernel_thread+0x1b/0x28
[<c1054790>] ? flush_kthread_work+0xd0/0xd0
Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
CR2: 00000000000003c8
---[ end trace 09e54015d145c9c6 ]---
The problem appears to be a regression that was introduced in commit
9a9c6478 "nfsd: make NFSv4 recovery client tracking options per net".
Prior to that commit, it was safe to pass a NULL net pointer to
nfsd4_client_tracking_exit in the legacy recdir case, and
legacy_recdir_name_error did so. After that comit, the net pointer must
be valid.
This patch just fixes legacy_recdir_name_error to pass in a valid net
pointer to that function.
Cc: <stable@vger.kernel.org> # v3.8+
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Linux client is using CLAIM_FH to implement regular opens, not just
recovery cases, so it depends on the server to check permissions
correctly.
Therefore the owner override, which may make sense in the delegation
recovery case, isn't right in the CLAIM_FH case.
Symptoms: on a client with 49f9a0fafd844c32f2abada047c0b9a5ba0d6255
"NFSv4.1: Enable open-by-filehandle", Bryan noticed this:
touch test.txt
chmod 000 test.txt
echo test > test.txt
succeeding.
Cc: stable@kernel.org
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs update from Tyler Hicks:
"Improve performance when AES-NI (and most likely other crypto
accelerators) is available by moving to the ablkcipher crypto API.
The improvement is more apparent on faster storage devices.
There's no noticeable change when hardware crypto is not available"
* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Use the ablkcipher crypto API
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make the switch from the blkcipher kernel crypto interface to the
ablkcipher interface.
encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
interface but, from the eCryptfs standpoint, still treat the crypto
operation as a synchronous operation. They submit the async request and
then wait until the operation is finished before they return. Most of
the changes are contained inside those two functions.
Despite waiting for the completion of the crypto operation, the
ablkcipher interface provides performance increases in most cases when
used on AES-NI capable hardware.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Reviewed-by: Zeev Zilberman <zeev@annapurnaLabs.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Thieu Le <thieule@google.com>
Cc: Li Wang <dragonylffly@163.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
"The bulk of the changes are generalizing the ColdFire v3 core support
and adding in 537x CPU support. Also a couple of other bug fixes, one
to fix a reintroduction of a past bug in the romfs filesystem nommu
support."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: enable Timer on coldfire 532x
m68knommu: fix ColdFire 5373/5329 QSPI base address
m68knommu: add support for configuring a Freescale M5373EVB board
m68knommu: add support for the ColdFire 537x family of CPUs
m68knommu: make ColdFire M532x platform support more v3 generic
m68knommu: create and use a common M53xx ColdFire class of CPUs
m68k: remove unused asm/dbg.h
m68k: Set ColdFire ACR1 cache mode depending on kernel configuration
romfs: fix nommu map length to keep inside filesystem
m68k: clean up unused "config ROMVECSIZE"
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Checks introduced in commit 4991e7251 ("romfs: do not use
mtd->get_unmapped_area directly") re-introduce problems fixed in the earlier
commit 2b4b2482e ("romfs: fix romfs_get_unmapped_area() argument check").
If a flat binary app is located at the end of a romfs, its page aligned
length may be outside of the romfs filesystem. The flat binary loader, via
nommu do_mmap_pgoff(), page aligns the length it is mmaping. So simple
offset+size checks will fail - returning EINVAL.
We can truncate the length to keep it inside the romfs filesystem, and that
also keeps the call to mtd_get_unmapped_area() happy.
Are there any side effects to truncating the size here though?
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull trivial pstore update from Tony Luck:
"Couple of pstore cleanups"
It turns out that the kmemdup() conversion ends up being undone by the
fact that the memory block also needed the ecc information (see commit
bd08ec33b5c2: "pstore/ram: Restore ecc information block"), so all that
remains after merging is the error return code change.
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore/ram: fix error return code in ramoops_probe()
fs: pstore: Replaced calls to kmalloc and memcpy with kmemdup
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Replaced calls to kmalloc and memcpy with a single call to kmemdup.
This patch was found using coccicheck.
Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs fixes from Al Viro:
"Regression fix from Geert + yet another open-coded kernel_read()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs: don't open-code kernel_read()
xtensa simdisk: Fix proc_create_data() conversion fallout
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
"These are mostly fixes. The biggest exceptions are Josef's skinny
extents and Jan Schmidt's code to rebuild our quota indexes if they
get out of sync (or you enable quotas on an existing filesystem).
The skinny extents are off by default because they are a new variation
on the extent allocation tree format. btrfstune -x enables them, and
the new format makes the extent allocation tree about 30% smaller.
I rebased this a few days ago to rework Dave Sterba's crc checks on
the super block, but almost all of these go back to rc6, since I
though 3.9 was due any minute.
The biggest missing fix is the tracepoint bug that was hit late in
3.9. I ran into problems with that in overnight testing and I'm still
tracking it down. I'll definitely have that fixed for rc2."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
Btrfs: allow superblock mismatch from older mkfs
btrfs: enhance superblock checks
btrfs: fix misleading variable name for flags
btrfs: use unsigned long type for extent state bits
Btrfs: improve the loop of scrub_stripe
btrfs: read entire device info under lock
btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
btrfs: handle errors returned from get_tree_block_key
btrfs: make static code static & remove dead code
Btrfs: deal with errors in write_dev_supers
Btrfs: remove almost all of the BUG()'s from tree-log.c
Btrfs: deal with free space cache errors while replaying log
Btrfs: automatic rescan after "quota enable" command
Btrfs: rescan for qgroups
Btrfs: split btrfs_qgroup_account_ref into four functions
Btrfs: allocate new chunks if the space is not enough for global rsv
Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
btrfs: move leak debug code to functions
Btrfs: return free space in cow error path
Btrfs: set UUID in root_item for created trees
...
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
We've added new checks to make sure the super block crc is correct
during mount. A fresh filesystem from an older mkfs won't have the
crc set. This adds a warning when it finds a newly created filesystem
but doesn't fail the mount.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The superblock checksum is not verified upon mount. <awkward silence>
Add that check and also reorder existing checks to a more logical
order.
Current mkfs.btrfs does not calculate the correct checksum of
super_block and thus a freshly created filesytem will fail to mount when
this patch is applied.
First transaction commit calculates correct superblock checksum and
saves it to disk.
Reproducer:
$ mfks.btrfs /dev/sda
$ mount /dev/sda /mnt
$ btrfs scrub start /mnt
$ sleep 5
$ btrfs scrub status /mnt
... super:2 ...
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The variable was named 'data' in btrfs_reserve_extent and that's the
only function that actually uses it to let btrfs_get_alloc_profile know
what profile we want. Then it's passed down as u64 flags.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
1) Right now scrub_stripe() is looping in some unnecessary cases:
* when the found extent item's objectid has been out of the dev extent's range
but we haven't finish scanning all the range within the dev extent
* when all the items has been processed but we haven't finish scanning all the
range within the dev extent
In both cases, we can just finish the loop to save costs.
2) Besides, when the found extent item's length is larger than the stripe
len(64k), we don't have to release the path and search again as it'll get at the
same key used in the last loop, we can instead increase the logical cursor in
place till all space of the extent is scanned.
3) And we use 0 as the key's offset to search btree, then get to previous item
to find a smaller item, and again have to move to the next one to get the right
item. Setting offset=-1 and previous_item() is the correct way.
4) As we won't find any checksum at offset unless this 'offset' is in a data
extent, we can just find checksum when we're really going to scrub an extent.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
There's a theoretical possibility of reading stale (or even more
theoretically, freed) data from DEV_INFO ioctl when the device would
disappear between an early mutex unlock and data being copied from the
device structure.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
It's unused since 0b32f4bbb423f02ac.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.
removed functions:
btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()
btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.
ulist.c functions are left, another patch will take care of those.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
If you try to mount -o loop a restored file system it will panic if the file
ends up being smaller than the original disk. This is because we go to try and
get a block for a super that may be past the EOF which makes __getblk return
NULL for a buffer head when we aren't expecting it to. Fix this by dealing with
this case and just jacking up the errors count. With this patch we no longer
panic when mounting a restored file system loopback. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
There were a whole bunch and I was doing it for other things. I haven't tested
these error paths but at the very least this is better than panicing. I've only
left 2 BUG_ON()'s since they are logic errors and I want to replace them with a
ASSERT framework that we can compile out for production users. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
So everybody who got hit by my fsync bug will still continue to hit this
BUG_ON() in the free space cache, which is pretty heavy handed. So I took a
file system that had this bug and fixed up all the BUG_ON()'s and leaks that
popped up when I tried to mount a broken file system like this. With this patch
we just fail to mount instead of panicing. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.
A filesystem under rescan can still be umounted. The rescan continues on the
next mount. Status information is provided with a separate ioctl while a
rescan operation is in progress.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
When running the 208th of xfstests, the fs returned the enospc
error when there was lots of free space in the disk.
By bisect debug, we found it was introduced by commit 96f1bb5777.
This commit makes the space check for the global reservation in
can_overcommit() be inconsistent with should_alloc_chunk().
can_overcommit() requires that the free space is 2 times the size
of the global reservation, or we can't do overcommit. And instead,
we need reclaim some reserved space, and if we still don't have
enough free space, we need allocate a new chunk. But unfortunately,
should_alloc_chunk() just requires that the free space is 1 time
the size of the global reservation, that is we would not try to
allocate a new chunk if the free space size is in the middle of
these two requires, and just return the enospc error. Fix it.
Cc: Jim Schutt <jaschut@sandia.gov>
Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Sequence numbers for delayed refs have been introduced in the first version
of the qgroup patch set. To solve the problem of find_all_roots on a busy
file system, the tree mod log was introduced. The sequence numbers for that
were simply shared between those two users.
However, at one point in qgroup's quota accounting, there's a statement
accessing the previous sequence number, that's still just doing (seq - 1)
just as it would have to in the very first version.
To satisfy that requirement, this patch makes the sequence number counter 64
bit and splits it into a major part (used for qgroup sequence number
counting) and a minor part (incremented for each tree modification in the
log). This enables us to go exactly one major step backwards, as required
for qgroups, while still incrementing the sequence counter for tree mod log
insertions to keep track of their order. Keeping them in a single variable
means there's no need to change all the code dealing with comparisons of two
sequence numbers.
The sequence number is reset to 0 on commit (not new in this patch), which
ensures we won't overflow the two 32 bit counters.
Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
from the tree mod log code may happen.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Clean up the leak debugging in extent_io.c by moving
the debug code into functions. This also removes the
list_heads used for debugging from the extent_buffer
and extent_state structures when debug is not enabled.
Since we need a global debug config to do that last
part, implement CONFIG_BTRFS_DEBUG to accommodate.
Thanks to Dave Sterba for the Kconfig bit.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Replace some BUG_ONs with proper handling and take allocated space back to
free space cache for later use.
We don't have to worry about extent maps since they'd be freed in releasepage
path.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
It is a rare exception that a new tree is created, like the qgroups
tree. So far these new trees have an all-zero UUID in their root
items. All trees that mkfs.btrfs has created get an UUID during the
first mount when btrfs_read_root_item() rewrites the root_item to
the v2 structure style. These UUID are never used so far, but
anyway, since it is better to have it uniform for all trees, this
commit adds some lines that generate and write an UUID for newly
created trees.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
fget() returns NULL if error. So, we should check NULL or not.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Variable 'p' is not used any more. So, remove it.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
I have a broken file system that when it aborts leaves all sorts of accounting
things wrong and gives you lots of WARN_ON()'s other than the abort. This is
because we're not cleaning up various parts of the file system when we abort.
The first chunks are specific to mount failures, we weren't cleaning up the
block group cached inodes and we weren't cleaning up any transactions that had
been aborted, which leaves a bunch of things laying around.
The second half of this are related to the cleanup parts. First we don't need
to release space for the dirty pages from the trans_block_rsv, that's all
handled by the trans handles so this is just plain wrong. The other thing is we
need to pin down extents that were set ->must_insert_reserved for delayed refs.
This isn't so much for the pinning but more for the cleaning up the
cache->reserved counter since we are no longer going to use those reserved
bytes. With this patch I no longer see a bunch of WARN_ON()'s when I try to
mount this broken file system, just the initial one from the abort. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
We can just look up the extent_buffers for the range and free stuff that way.
This makes the cleanup a bit cleaner and we can make sure to evict the
extent_buffers pretty quickly by marking them as stale. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
We need to check the return value of the commit in case something goes wrong,
otherwise we could end up going down the line and doing more stuff (like orphan
cleanup) before we notice we should have errored out. We need to do this before
we free up the log_tree_root since the caller will handle all of that. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This is just obnoxious. Just print a message, abort the transaction, and return
an error. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
We can run the tree logging recovery or the orphan cleanup on mount, so we'll
end up looking up a random fs tree in the meantime. So we need to clean this up
so we don't leave extent buffers hanging around on the cache. With this patch
we no longer leak extent buffers on failure to mount. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This is the same as the fix from commit
Btrfs: fix bad extent logging
but for O_DIRECT. I missed this when I fixed the problem originally, we were
still using the em for the orig_start and orig_block_len, which would be the
merged extent. We need to use the actual extent from the on disk file extent
item, which we have to lookup to make sure it's ok to nocow anyway so just pass
in some pointers to hold this info. Thanks,
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
We kept leaking extent buffers when mounting a broken file system and it turns
out it's because not everybody uses read_tree_block properly. You need to check
and make sure the extent_buffer is uptodate before you use it. This patch fixes
everybody who calls read_tree_block directly to make sure they check that it is
uptodate and free it and return an error if it is not. With this we no longer
leak EB's when things go horribly wrong. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
If we fail to load block groups halfway through we can leave extent_state's on
the excluded tree. This is because we just lookup the supers and add them to
the excluded tree regardless of which block group we are looking at currently.
This is a problem because we remove the excluded extents for the range of the
block group only, so if we don't ever load a block group for one of the excluded
extents we won't ever free it. This fixes the problem by only adding excluded
extents if it falls in the block group range we care about. With this patch
we're no longer leaking space when we fail to read all of the block groups.
Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
|