| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
&percpu_data is compatible with allocated percpu data.
And we use it and remove the "->rda[NR_CPUS]" array, saving significant
storage on systems with large numbers of CPUs. This does add an additional
level of indirection and thus an additional cache line referenced, but
because ->rda is not used on the read side, this is OK.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add random preemption to help we to torture the preemptable rcu.
srcu_read_delay() also calls rcu_read_delay() for shorter delays.
Added comment to preempt_schedule() call indicating that no quiescent
states happen if preemption is disabled.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
| |
Add a section describing PROVE_RCU, DEBUG_OBJECTS_RCU_HEAD, and
the __rcu sparse checking to the RCU checklist.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
| |
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.
We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers. If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain. To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option. Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.
There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.
This patch also updates a number of docbook comments that were showing
their age.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christopher Li <sparse@chrisli.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference(). This commit therefore moves to the new RCU API
rcu_dereference_index_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix an Oops in the NFSv4 atomic open code
NFS: Fix the selection of security flavours in Kconfig
NFS: fix the return value of nfs_file_fsync()
rpcrdma: Fix SQ size calculation when memreg is FRMR
xprtrdma: Do not truncate iova_start values in frmr registrations.
nfs: Remove redundant NULL check upon kfree()
nfs: Add "lookupcache" to displayed mount options
NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Adam Lackorzynski reports:
with 2.6.35.2 I'm getting this reproducible Oops:
[ 110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] PGD be89f067 PUD bf18f067 PMD 0
[ 110.828638] Oops: 0000 [#1] SMP
[ 110.828638] last sysfs file: /sys/class/net/lo/operstate
[ 110.828638] CPU 2
[ 110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[ 110.828638]
[ 110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[ 110.828638] RIP: 0010:[<ffffffff811247b7>] [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[ 110.828638] RSP: 0000:ffff88003bf5b878 EFLAGS: 00010296
[ 110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[ 110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[ 110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[ 110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[ 110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[ 110.828638] FS: 0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[ 110.828638] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[ 110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[ 110.828638] Stack:
[ 110.828638] 0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] Call Trace:
[ 110.828638] [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[ 110.828638] [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[ 110.828638] [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[ 110.828638] [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[ 110.828638] [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[ 110.828638] [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[ 110.828638] [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[ 110.828638] [<ffffffff810ac521>] ? ifind+0x4e/0x90
[ 110.828638] [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[ 110.828638] [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[ 110.828638] [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[ 110.828638] [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[ 110.828638] [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[ 110.828638] [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[ 110.828638] [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[ 110.828638] [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[ 110.828638] [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[ 110.828638] [<ffffffff810a9b1b>] ? dput+0x37/0x152
[ 110.828638] [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[ 110.828638] [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[ 110.828638] [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[ 110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[ 110.828638] RIP [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] RSP <ffff88003bf5b878>
[ 110.828638] CR2: 0000000000000000
[ 112.840396] ---[ end trace 95282e83fd77358f ]---
We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Randy Dunlap reports:
ERROR: "svc_gss_principal" [fs/nfs/nfs.ko] undefined!
because in fs/nfs/Kconfig, NFS_V4 selects RPCSEC_GSS_KRB5
and/or in fs/nfsd/Kconfig, NFSD_V4 selects RPCSEC_GSS_KRB5.
RPCSEC_GSS_KRB5 does 5 selects, but none of these is enforced/followed
by the fs/nfs[d]/Kconfig configs:
select SUNRPC_GSS
select CRYPTO
select CRYPTO_MD5
select CRYPTO_DES
select CRYPTO_CBC
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
By the commit af7fa16 2010-08-03 NFS: Fix up the fsync code
close(2) became returning the non-zero value even if it went well.
nfs_file_fsync() should return 0 when "status" is positive.
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch updates the computation to include the worst case situation
where three FRMR are required to map a single RPC REQ.
Signed-off-by: Tom Tucker <tom@ogc.us>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A bad cast causes the iova_start, which in this case is a 64b DMA
bus address, to be truncated on 32b systems. This breaks frmrs on
32b systems. No cast is needed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Running "cat /proc/mounts" fails to display the "lookupcache" option.
This oversight cost me a bunch of wasted time recently.
The following simple patch fixes it.
CC: stable <stable@kernel.org>
Signed-off-by: Patrick LoPresti <lopresti@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To obey NFS cache semantics, the client must verify the cached
attributes when a file is opened. In most cases this is done by a call to
d_validate as one of the last steps in path_walk.
However for the root of a filesystem, d_validate is only ever called
on the mounted-on filesystem (except when the path ends '.' or '..').
So NFS has no chance to validate the attributes.
So, in nfs_opendir, we revalidate the attributes if the opened
directory is the mountpoint. This may cause double-validation for "."
and ".." lookups, but that is better than missing regular /path/name
lookups completely.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When reusing a TCP connection, ensure that it's aborted if a previous
shutdown attempt has been made on that connection so that the RPC over
TCP recovery mechanism succeeds.
Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
HID: hiddev: fix memory corruption due to invalid intfdata
HID: hiddev: protect against disconnect/NULL-dereference race
HID: picolcd: correct ordering of framebuffer freeing
HID: picolcd: testing the wrong variable
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The JooJoo tablet (http://thejoojoo.com/) contains an "eGalax Inc. USB
TouchController", and this patch hooks it up to the egalax-touch driver.
Without the patch we don't get any cursor motion, since it comes through
Z/RX rather than X/Y.
(The egalax-touch driver does not yet generate a correct event sequence
for the "serial" protocol used by this device, though -- see the note
added to the code, which comes from research by Stéphane Chatty.)
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Stéphane Chatty <chatty@enac.fr>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit bd25f4dd6972755579d0 ("HID: hiddev: use usb_find_interface,
get rid of BKL") introduced using of private intfdata in hiddev for
purpose of storing hiddev pointer.
This is a problem, because intf pointer is already being set to struct
hid_device pointer by HID core. This obviously lead to memory corruptions
at device disconnect time, such as
WARNING: at lib/kobject.c:595 kobject_put+0x37/0x4b()
kobject: '(null)' (ffff88011e9cd898): is not initialized, yet kobject_put() is being called.
Convert hiddev into accessing hiddev through struct hid_device which is
in intfdata already.
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-and-tested-by: Heinz Diehl <htd@fritha.org>
Reported-and-tested-by: Alan Ott <alan@signal11.us>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
One of our users reports consistently hitting a NULL dereference that
resolves to the "hid_to_usb_dev(hid);" call in hiddev_ioctl(), when
disconnecting a Lego WeDo USB HID device from an OLPC XO running
Scratch software. There's a FIXME comment and a guard against the
dereference, but that happens farther down the function than the
initial dereference does.
This patch moves the call to be below the guard, and the user reports
that it fixes the problem for him. OLPC bug report:
http://dev.laptop.org/ticket/10174
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix the free() ordering (which was never reached due to wrong check).
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
"ref_cnt" is a point to the reference count and it's non-null. We really
want to test the reference count itself.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix build error: conflicting types for ‘sys_execve’
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
arch/ia64/kernel/process.c:636: error: conflicting types for ‘sys_execve’
commit d7627467b7a8dd6944885290a03a07ceb28c10eb
Make do_execve() take a const filename pointer
Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: brlock vfsmount_lock
Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.
A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.
The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).
The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).
One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.
However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.
A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.
Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.
Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.
Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9
us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75
So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.
Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.
Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
lglock: introduce special lglock and brlock spin locks
This patch introduces "local-global" locks (lglocks). These can be used to:
- Provide fast exclusive access to per-CPU data, with exclusive access to
another CPU's data allowed but possibly subject to contention, and to provide
very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
very slow exclusive serialisation of data (not necessarily per-CPU data).
Brlocks are also implemented as a short-hand notation for the latter use
case.
Thanks to Paul for local/global naming convention.
Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: remove extra lookup in __lookup_hash
Optimize lookup for create operations, where no dentry should often be
common-case. In cases where it is not, such as unlink, the added overhead
is much smaller than the removed.
Also, move comments about __d_lookup racyness to the __d_lookup call site.
d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
- We are interested in how the RCU lookup works here, particularly with
renames. Make that explicit, and point to the document where it is explained
in more detail.
- RCU is pretty standard now, and macros make implementations pretty mindless.
If we want to know about RCU barrier details, we look in RCU code.
- Delete some boring legacy comments because we don't care much about how the
code used to work, more about the interesting parts of how it works now. So
comments about lazy LRU may be interesting, but would better be done in the
LRU or refcount management code.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: fs_struct rwlock to spinlock
struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.
Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
apparmor: use task path helpers
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: dentry allocation consolidation
There are 2 duplicate copies of code in dentry allocation in path lookup.
Consolidate them into a single function.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fs: fix do_lookup false negative
In do_lookup, if we initially find no dentry, we take the directory i_mutex and
re-check the lookup. If we find a dentry there, then we revalidate it if
needed. However if that revalidate asks for the dentry to be invalidated, we
return -ENOENT from do_lookup. What should happen instead is an attempt to
allocate and lookup a new dentry.
This is probably not noticed because it is rare. It is only reached if a
concurrent create races in first (in which case, the dentry probably won't be
invalidated anyway), or if the racy __d_lookup has failed due to a
false-negative (which is very rare).
Fix this by removing code and have it use the normal reval path.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Limit the maximum number of mb_cache entries depending on the number of
hash buckets: if the only limit to the number of cache entries is the
available memory the hash chains can grow very long, taking a long time
to search.
At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
we want the assignment to err done inside the if () to be
visible after it, so (re)declaring err inside if () body
is wrong.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
... not harmless in this case - we have a string in the end of buffer
already.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.
Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.
In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
generic_acl_set didn't update the ctime of the file when its permission was
changed.
Steps to reproduce:
# touch aaa
# stat -c %Z aaa
1275289822
# setfacl -m 'u::x,g::x,o::x' aaa
# stat -c %Z aaa
1275289822 <- unchanged
But, according to the spec of the ctime, vfs must update it.
Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.
CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit 77b8a75f5bb introduced a warning at fs/inode.c:692 unlock_new_inode(),
caused by unlock_new_inode() being called on existing inodes as well.
This patch changes setup_inode() to only call unlock_new_inode() for I_NEW
inodes.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|