op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ceph: use complete_all and wake_up_all	Yehuda Sadeh	2010-07-27	6	-20/+20
\| \| \| \| \| \| \| \| \| \|	This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"	Robert P. J. Day	2010-07-24	1	-1/+1
\| \| \| \| \|	Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix dentry lease release	Sage Weil	2010-07-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	When we embed a dentry lease release notification in a request, invalidate our lease so we don't think we still have it. Otherwise we can get all sorts of incorrect client behavior when multiple clients are interacting with the same part of the namespace. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix leak of dentry in ceph_init_dentry() error path	Sage Weil	2010-07-23	1	-1/+3
\| \| \| \| \| \|	If we fail to allocate a ceph_dentry_info, don't leak the dn reference. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix pg_mapping leak on pg_temp updates	Sage Weil	2010-07-23	1	-11/+15
\| \| \| \| \| \| \|	Free the ceph_pg_mapping structs when they are removed from the pg_temp rbtree. Also fix a leak in the __insert_pg_mapping() error path. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix d_release dop for snapdir, snapped dentries	Sage Weil	2010-07-23	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	We need to set the d_release dop for snapdir and snapped dentries so that the ceph_dentry_info struct gets released. We also use the dcache to cache readdir results when possible, which only works if we know when dentries are dropped from the cache. Since we don't use the dcache for readdir in the hidden snapdir, avoid that case in ceph_dentry_release. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: avoid dcache readdir for snapdir	Sage Weil	2010-07-22	1	-0/+1
\| \| \| \| \| \| \| \|	We should always go to the MDS for readdir on the hidden snapdir. The set of snapshots can change at any time; the client can't trust its cache for that. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: do not include cap/dentry releases in replayed messages	Sage Weil	2010-07-16	2	-0/+9
\| \| \| \| \| \| \| \| \|	Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: reuse request message when replaying against recovering mds	Sage Weil	2010-07-16	1	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix creation of ipv6 sockets	Sage Weil	2010-07-09	1	-3/+5
\| \| \| \| \| \|	Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix parsing of ipv6 addresses	Sage Weil	2010-07-09	1	-6/+19
\| \| \| \| \| \| \|	Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix printing of ipv6 addrs	Sage Weil	2010-07-08	1	-18/+6
\| \| \| \| \| \| \| \|	The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: add kfree() to error path	Dan Carpenter	2010-07-08	1	-0/+1
\| \| \| \| \| \| \|	We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix leak of mon authorizer	Sage Weil	2010-07-05	1	-0/+3
\| \| \| \| \| \|	Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix message revocation	Sage Weil	2010-07-05	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix crush device 'out' threshold to 1.0, not 0.1	Sage Weil	2010-07-05	1	-1/+1
\| \| \| \| \| \| \|	Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix caps usage accounting for import (non-reserved) case	Sage Weil	2010-06-29	1	-2/+8
\| \| \| \| \| \| \|	We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: only release clean, unused caps with mds requests	Sage Weil	2010-06-29	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \|	We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix crush CHOOSE_LEAF when type is already a leaf	Sage Weil	2010-06-24	1	-13/+25
\| \| \| \| \| \| \|	We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix crush recursion	Sage Weil	2010-06-24	1	-0/+1
\| \| \| \| \| \| \|	There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix caps debugfs entry	Yehuda Sadeh	2010-06-24	1	-1/+1
\| \| \| \| \| \| \|	The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: delay umount until all mds requests drop inode+dentry refs	Sage Weil	2010-06-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulate	Sage Weil	2010-06-21	1	-7/+12
\| \| \| \| \| \| \|	Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix crush map update decoding	Sage Weil	2010-06-17	1	-0/+1
\| \| \| \| \| \| \|	If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix message memory leak, uninitialized variable	Sage Weil	2010-06-13	1	-0/+2
\| \| \| \| \| \| \| \| \|	We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix map handler error path	Sage Weil	2010-06-13	1	-1/+2
\| \| \| \| \| \|	Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: some endianity fixes	Yehuda Sadeh	2010-06-13	3	-3/+4
\| \| \| \| \| \| \|	Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: try to send partial cap release on cap message on missing inode	Sage Weil	2010-06-10	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \|	If we have enough memory to allocate a new cap release message, do so, so that we can send a partial release message immediately. This keeps us from making the MDS wait when the cap release it needs is in a partially full release message. If we fail because of ENOMEM, oh well, they'll just have to wait a bit longer. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: release cap on import if we don't have the inode	Sage Weil	2010-06-10	3	-38/+61
\| \| \| \| \| \| \| \|	If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix misleading/incorrect debug message	Sage Weil	2010-06-10	1	-1/+1
\| \| \| \| \| \|	Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix atomic64_t initialization on ia64	Jeff Mahoney	2010-06-10	1	-1/+1
\| \| \| \| \| \| \| \|	bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix lease revocation when seq doesn't match	Sage Weil	2010-06-04	1	-4/+8
\| \| \| \| \| \| \| \|	If the client revokes a lease with a higher seq than what we have, keep the mds's seq, so that it honors our release. Otherwise, we can hang indefinitely. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix f_namelen reported by statfs	Sage Weil	2010-06-01	1	-1/+1
\| \| \| \| \| \| \| \| \|	We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix memory leak in statfs	Yehuda Sadeh	2010-06-01	1	-0/+2
\| \| \| \| \| \| \|	Freeing the statfs request structure when required. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
*	ceph: fix d_subdirs ordering problem	Henry C Chang	2010-06-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-05-30	17	-32/+85
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: clean up on forwarded aborted mds request ceph: fix leak of osd authorizer ceph: close out mds, osd connections before stopping auth ceph: make lease code DN specific fs/ceph: Use ERR_CAST ceph: renew auth tickets before they expire ceph: do not resend mon requests on auth ticket renewal ceph: removed duplicated #includes ceph: avoid possible null dereference ceph: make mds requests killable, not interruptible sched: add wait_for_completion_killable_timeout
\| *	ceph: clean up on forwarded aborted mds request	Sage Weil	2010-05-29	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an mds request is aborted (timeout, SIGKILL), it is left registered to keep our state in sync with the mds. If we get a forward notification, though, we know the request didn't succeed and we can unregister it safely. We were trying to resend it, but then bailing out (and not unregistering) in __do_request. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: fix leak of osd authorizer	Sage Weil	2010-05-29	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Release the ceph_authorizer when releasing osd state. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: close out mds, osd connections before stopping auth	Sage Weil	2010-05-29	3	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The auth module (part of the mon_client) is needed to free any ceph_authorizer(s) used by the mds and osd connections. Flush the msgr workqueue before stopping monc to ensure that the destroy_authorizer auth op is available when those connections are closed out. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: make lease code DN specific	Sage Weil	2010-05-29	2	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lease code includes a mask in the CEPH_LOCK_* namespace, but that namespace is changing, and only one mask (formerly _DN == 1) is used, so hard code for that value for now. If we ever extend this code to handle leases over different data types we can extend it accordingly. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	fs/ceph: Use ERR_CAST	Julia Lawall	2010-05-29	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: renew auth tickets before they expire	Sage Weil	2010-05-29	4	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were only requesting renewal after our tickets expire; do so before that. Most of the low-level logic for this was already there; just use it. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: do not resend mon requests on auth ticket renewal	Sage Weil	2010-05-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only want to send pending mon requests when we successfully authenticate. If we are already authenticated, like when we renew our ticket, there is no need to resend pending requests. Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: removed duplicated #includes	Andrea Gelmini	2010-05-29	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fs/ceph/auth.c: linux/slab.h is included more than once. fs/ceph/super.h: linux/slab.h is included more than once. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: avoid possible null dereference	Sage Weil	2010-05-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ac->ops may be null; use protocol id in error message instead. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: make mds requests killable, not interruptible	Sage Weil	2010-05-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The underlying problem is that many mds requests can't be restarted. For example, a restarted create() would return -EEXIST if the original request succeeds. However, we do not want a hung MDS to hang the client too. So, use the _killable wait_for_completion variants to abort on SIGKILL but nothing else. Signed-off-by: Sage Weil <sage@newdream.net>
* \|	drop unused dentry argument to ->fsync	Christoph Hellwig	2010-05-27	3	-6/+5
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-05-24	30	-759/+876
\|\ \ \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits) ceph: reuse mon subscribe message instead of allocated anew ceph: avoid resending queued message to monitor ceph: Storage class should be before const qualifier ceph: all allocation functions should get gfp_mask ceph: specify max_bytes on readdir replies ceph: cleanup pool op strings ceph: Use kzalloc ceph: use common helper for aborted dir request invalidation ceph: cope with out of order (unsafe after safe) mds reply ceph: save peer feature bits in connection structure ceph: resync headers with userland ceph: use ceph. prefix for virtual xattrs ceph: throw out dirty caps metadata, data on session teardown ceph: attempt mds reconnect if mds closes our session ceph: clean up send_mds_reconnect interface ceph: wait for mds OPEN reply to indicate reconnect success ceph: only send cap releases when mds is OPEN\|HUNG ceph: dicard cap releases on mds restart ceph: make mon client statfs handling more generic ceph: drop src address(es) from message header [new protocol feature] ...
\| *	ceph: reuse mon subscribe message instead of allocated anew	Sage Weil	2010-05-21	2	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the same message, allocated during startup. No need to reallocate a new one each time around (and potentially ENOMEM). Signed-off-by: Sage Weil <sage@newdream.net>
\| *	ceph: avoid resending queued message to monitor	Sage Weil	2010-05-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The auth_reply handler will (re)send any pending requests. For the initial mon authenticate phase, that's correct, but when a auth ticket renewal races with an in-flight request, we may resend a request message that is already in flight. Avoid this by revoking the message before sending it. We should also avoid resending requests at all during ticket renewal; that will come soon. Signed-off-by: Sage Weil <sage@newdream.net>