op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	nfsd4: don't close read-write opens too soon	J. Bruce Fields	2013-04-09	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Don't actually close any opens until we don't need them at all. This means being left with write access when it's not really necessary, but that's better than putting a file that might still have posix locks held on it, as we have been. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: release lockowners on last unlock in 4.1 case	J. Bruce Fields	2013-04-09	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the 4.1 case we're supposed to release lockowners as soon as they're no longer used. It would probably be more efficient to reference count them, but that's slightly fiddly due to the need to have callbacks from locks.c to take into account lock merging and splitting. For most cases just scanning the inode's lock list on unlock for matching locks will be sufficient. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: more sessions/open-owner-replay cleanup	J. Bruce Fields	2013-04-09	1	-12/+16
\| \| \| \| \| \|	More logic that's unnecessary in the 4.1 case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: no need for replay_owner in sessions case	J. Bruce Fields	2013-04-09	2	-5/+5
\| \| \| \| \| \|	The replay_owner will never be used in the sessions case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: remove some redundant comments	J. Bruce Fields	2013-04-09	1	-6/+0
\| \| \| \|	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: use kmem_cache_free() instead of kfree()	Wei Yongjun	2013-04-09	1	-1/+1
\| \| \| \| \| \| \| \| \|	memory allocated by kmem_cache_alloc() should be freed using kmem_cache_free(), not kfree(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: cleanup handling of nfsv4.0 closed stateid's	J. Bruce Fields	2013-04-08	5	-56/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closed stateid's are kept around a little while to handle close replays in the 4.0 case. So we stash them in the last-used stateid in the oo_last_closed_stateid field of the open owner. We can free that in encode_seqid_op_tail once the seqid on the open owner is next incremented. But we don't want to do that on the close itself; so we set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the first time through encode_seqid_op_tail, then when we see that flag set next time we free it. This is unnecessarily baroque. Instead, just move the logic that increments the seqid out of the xdr code and into the operation code itself. The justification given for the current placement is that we need to wait till the last minute to be sure we know whether the status is a sequence-id-mutating error or not, but examination of the code shows that can't actually happen. Reported-by: Yanchuan Nian <ycnian@gmail.com> Tested-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: remove unused nfs4_check_deleg argument	J. Bruce Fields	2013-04-04	1	-2/+2
\| \| \| \|	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: make del_recall_lru per-network-namespace	J. Bruce Fields	2013-04-04	2	-8/+8
\| \| \| \| \| \|	If nothing else this simplifies the nfs4_state_shutdown_net logic a tad. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: shut down more of delegation earlier	J. Bruce Fields	2013-04-04	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Once we've unhashed the delegation, it's only hanging around for the benefit of an oustanding recall, which only needs the encoded filehandle, stateid, and dl_retries counter. No point keeping the file around any longer, or keeping it hashed. This also fixes a race: calls to idr_remove should really be serialized by the caller, but the nfs4_put_delegation call from the callback code isn't taking the state lock. (Better might be to cancel the callback before destroying the delegation, and remove any need for reference counting--but I don't see an easy way to cancel an rpc call.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: minor cb_recall simplification	J. Bruce Fields	2013-04-04	1	-5/+3
\| \| \| \|	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error	fanchaoting	2013-04-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs, if don't do it, it maybe cause Memory leak. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: don't run get_file if nfs4_preprocess_stateid_op return error	fanchaoting	2013-04-03	1	-4/+4
\| \| \| \| \| \| \| \| \|	we should return error status directly when nfs4_preprocess_stateid_op return error. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: convert the file_hashtbl to a hlist	Jeff Layton	2013-04-03	2	-11/+5
\| \| \| \| \| \| \| \|	We only ever traverse the hash chains in the forward direction, so a double pointer list head isn't really necessary. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: don't destroy in-use session	J. Bruce Fields	2013-04-03	2	-33/+43
\| \| \| \| \| \| \| \| \|	This changes session destruction to be similar to client destruction in that attempts to destroy a session while in use (which should be rare corner cases) result in DELAY. This simplifies things somewhat and helps meet a coming 4.2 requirement. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: don't destroy in-use clients	J. Bruce Fields	2013-04-03	3	-97/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a setclientid_confirm or create_session confirms a client after a client reboot, it also destroys any previous state held by that client. The shutdown of that previous state must be careful not to free the client out from under threads processing other requests that refer to the client. This is a particular problem in the NFSv4.1 case when we hold a reference to a session (hence a client) throughout compound processing. The server attempts to handle this by unhashing the client at the time it's destroyed, then delaying the final free to the end. But this still leaves some races in the current code. I believe it's simpler just to fail the attempt to destroy the client by returning NFS4ERR_DELAY. This is a case that should never happen anyway. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: simplify bind_conn_to_session locking	J. Bruce Fields	2013-04-03	1	-14/+14
\| \| \| \| \| \| \| \|	The locking here is very fiddly, and there's no reason for us to be setting cstate->session, since this is the only op in the compound. Let's just take the state lock and drop the reference counting. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: fix destroy_session race	J. Bruce Fields	2013-04-03	1	-16/+10
\| \| \| \| \| \| \| \| \|	destroy_session uses the session and client without continuously holding any reference or locks. Put the whole thing under the state lock for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: clientid lookup cleanup	J. Bruce Fields	2013-04-03	1	-12/+12
\| \| \| \|	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: destroy_clientid simplification	J. Bruce Fields	2013-04-03	1	-7/+1
\| \| \| \| \| \| \| \| \| \|	I'm not sure what the check for clientid expiry was meant to do here. The check for a matching session is redundant given the previous check for state: a client without state is, in particular, a client without sessions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: remove some dprintk's	J. Bruce Fields	2013-04-03	1	-8/+1
\| \| \| \| \| \| \|	E.g. printk's that just report the return value from an op are uninteresting as we already do that in the main proc_compound loop. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: STALE_STATEID cleanup	J. Bruce Fields	2013-04-03	1	-15/+6
\| \| \| \|	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: warn on odd create_session state	J. Bruce Fields	2013-04-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	This should never happen. (Note: the comparable case in setclientid_confirm can happen, since updating a client record can result in both confirmed and unconfirmed records with the same clientid.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: fix bug on nfs4 stateid deallocation	ycnian@gmail.com	2013-04-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4 stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(), but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in THIS close procedure may be freed immediately in the coming encoding function. Sorry that Signed-off-by was forgotten in last version. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: remove unused macro in nfsv4	Yanchuan Nian	2013-04-03	1	-1/+0
\| \| \| \| \| \| \| \|	lk_rflags is never used anywhere, and rflags is not defined in struct nfsd4_lock. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: fix use-after-free of 4.1 client on connection loss	J. Bruce Fields	2013-04-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Once we drop the lock here there's nothing keeping the client around: the only lock still held is the xpt_lock on this socket, but this socket no longer has any connection with the client so there's no way for other code to know we're still using the client. The solution is simple: all nfsd4_probe_callback does is set a few variables and queue some work, so there's no reason we can't just keep it under the lock. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: fix race on client shutdown	J. Bruce Fields	2013-04-03	3	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dropping the session's reference count after the client's means we leave a window where the session's se_client pointer is NULL. An xpt_user callback that encounters such a session may then crash: [ 303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318 [ 303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40 [ 303.959061] PGD 37811067 PUD 3d498067 PMD 0 [ 303.959061] Oops: 0002 [#8] PREEMPT SMP [ 303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio [ 303.959061] CPU 0 [ 303.959061] Pid: 264, comm: nfsd Tainted: G D 3.8.0-ARCH+ #156 Bochs Bochs [ 303.959061] RIP: 0010:[<ffffffff81481a8e>] [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40 [ 303.959061] RSP: 0018:ffff880037877dd8 EFLAGS: 00010202 [ 303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278 [ 303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318 [ 303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002 [ 303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00 [ 303.959061] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 303.959061] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0 [ 303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0) [ 303.959061] Stack: [ 303.959061] ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278 [ 303.959061] ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f [ 303.959061] 0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00 [ 303.959061] Call Trace: [ 303.959061] [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd] [ 303.959061] [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc] [ 303.959061] [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc] [ 303.959061] [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd] [ 303.959061] [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd] [ 303.959061] [<ffffffff8107ae00>] kthread+0xc0/0xd0 [ 303.959061] [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100 [ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70 [ 303.959061] [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0 [ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70 [ 303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f [ 303.959061] RIP [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40 [ 303.959061] RSP <ffff880037877dd8> [ 303.959061] CR2: 0000000000000318 [ 304.001218] ---[ end trace 2d809cd4a7931f5a ]--- [ 304.001903] note: nfsd[264] exited with preempt_count 2 Reported-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: handle seqid-mutating open errors from xdr decoding	J. Bruce Fields	2013-04-03	3	-1/+28
\| \| \| \| \| \| \| \| \| \|	If a client sets an owner (or group_owner or acl) attribute on open for create, and the mapping of that owner to an id fails, then we return BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't shortcut the open processing that case: we have to at least look up the owner so we can find the seqid to bump. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: remove BUG_ON	J. Bruce Fields	2013-04-03	1	-6/+3
\| \| \| \| \| \| \|	This BUG_ON just crashes the thread a little earlier than it would otherwise--it doesn't seem useful. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: scale up the number of DRC hash buckets with cache size	Jeff Layton	2013-04-03	1	-15/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've now increased the size of the duplicate reply cache by quite a bit, but the number of hash buckets has not changed. So, we've gone from an average hash chain length of 16 in the old code to 4096 when the cache is its largest. Change the code to scale out the number of buckets with the max size of the cache. At the same time, we also need to fix the hash function since the existing one isn't really suitable when there are more than 256 buckets. Move instead to use the stock hash_32 function for this. Testing on a machine that had 2048 buckets showed that this gave a smaller longest:average ratio than the existing hash function: The formula here is longest hash bucket searched divided by average number of entries per bucket at the time that we saw that longest bucket: old hash: 68/(39258/2048) == 3.547404 hash_32: 45/(33773/2048) == 2.728807 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: keep stats on worst hash balancing seen so far	Jeff Layton	2013-04-03	1	-4/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The typical case with the DRC is a cache miss, so if we keep track of the max number of entries that we've ever walked over in a search, then we should have a reasonable estimate of the longest hash chain that we've ever seen. With that, we'll also keep track of the total size of the cache when we see the longest chain. In the case of a tie, we prefer to track the smallest total cache size in order to properly gauge the worst-case ratio of max vs. avg chain length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: add new reply_cache_stats file in nfsdfs	Jeff Layton	2013-04-03	3	-0/+35
\| \| \| \| \| \| \|	For presenting statistics relating to duplicate reply cache. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: track memory utilization by the DRC	Jeff Layton	2013-04-03	1	-5/+17
\| \| \| \| \|	Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: break out comparator into separate function	Jeff Layton	2013-04-03	1	-11/+35
\| \| \| \| \| \| \| \| \| \|	Break out the function that compares the rqstp and checksum against a reply cache entry. While we're at it, track the efficacy of the checksum over the NFS data by tracking the cases where we would have incorrectly matched a DRC entry if we had not tracked it or the length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: eliminate one of the DRC cache searches	Jeff Layton	2013-04-03	1	-22/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The most common case is to do a search of the cache, followed by an insert. In the case where we have to allocate an entry off the slab, then we end up having to redo the search, which is wasteful. Better optimize the code for the common case by eliminating the initial search of the cache and always preallocating an entry. In the case of a cache hit, we'll end up just freeing that entry but that's preferable to an extra search. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd4: reject "negative" acl lengths	J. Bruce Fields	2013-03-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Since we only enforce an upper bound, not a lower bound, a "negative" length can get through here. The symptom seen was a warning when we attempt to a kmalloc with an excessive size. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: fix bad offset use	Kent Overstreet	2013-03-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vfs_writev() updates the offset argument - but the code then passes the offset to vfs_fsync_range(). Since offset now points to the offset after what was just written, this is probably not what was intended Introduced by face15025ffdf664de95e86ae831544154d26c9c "nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes". Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: fix startup order in nfsd_reply_cache_init	Jeff Layton	2013-03-18	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we end up doing "goto out_nomem" in this function, we'll call nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and free entries, but that list may not be initialized yet if the server is starting up for the first time. It's also possible for the shrinker to kick in before we've initialized the LRU list. Rearrange the initialization so that the LRU list_head and cache size are initialized before doing any of the allocations that might fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	nfsd: only unhash DRC entries that are in the hashtable	Jeff Layton	2013-03-18	1	-1/+2
\| \| \| \| \| \| \| \| \|	It's not safe to call hlist_del() on a newly initialized hlist_node. That leads to a NULL pointer dereference. Only do that if the entry is hashed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2013-03-17	7	-12/+25
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Eric's rcu barrier patch fixes a long standing problem with our unmount code hanging on to devices in workqueue helpers. Liu Bo nailed down a difficult assertion for in-memory extent mappings." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix warning of free_extent_map Btrfs: fix warning when creating snapshots Btrfs: return as soon as possible when edquot happens Btrfs: return EIO if we have extent tree corruption btrfs: use rcu_barrier() to wait for bdev puts at unmount Btrfs: remove btrfs_try_spin_lock Btrfs: get better concurrency for snapshot-aware defrag work
\| *	Btrfs: fix warning of free_extent_map	Liu Bo	2013-03-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users report that an extent map's list is still linked when it's actually going to be freed from cache. The story is that a) when we're going to drop an extent map and may split this large one into smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means that it's on the list to be logged, then the smaller ems split from it will also be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected. b) we'll keep ems from unlinking the list and freeing when they are flagged with EXTENT_FLAG_LOGGING, because the log code holds one reference. The end result is the warning, but the truth is that we set the flag EXTENT_FLAG_LOGGING only during fsync. So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	Btrfs: fix warning when creating snapshots	Liu Bo	2013-03-14	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Creating snapshot passes extent_root to commit its transaction, but it can lead to the warning of checking root for quota in the __btrfs_end_transaction() when someone else is committing the current transaction. Since we've recorded the needed root in trans_handle, just use it to get rid of the warning. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	Btrfs: return as soon as possible when edquot happens	Wang Shilong	2013-03-14	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If one of qgroup fails to reserve firstly, we should return immediately, it is unnecessary to continue check. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	Btrfs: return EIO if we have extent tree corruption	Josef Bacik	2013-03-14	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The callers of lookup_inline_extent_info all handle getting an error back properly, so return an error if we have corruption instead of being a jerk and panicing. Still WARN_ON() since this is kind of crucial and I've been seeing it a bit too much recently for my taste, I think we're doing something wrong somewhere. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	btrfs: use rcu_barrier() to wait for bdev puts at unmount	Eric Sandeen	2013-03-14	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	Btrfs: remove btrfs_try_spin_lock	Liu Bo	2013-03-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a useless function declaration Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
\| *	Btrfs: get better concurrency for snapshot-aware defrag work	Liu Bo	2013-03-14	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using spinning case instead of blocking will result in better concurrency overall. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* \|	Merge branch 'for_linus' of ↵	Linus Torvalds	2013-03-14	5	-7/+9
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, ext3, reiserfs, quota fixes from Jan Kara: "A fix for regression in ext2, and a format string issue in ext3. The rest isn't too serious." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix BUG_ON in evict() on inode deletion reiserfs: Use kstrdup instead of kmalloc/strcpy ext3: Fix format string issues quota: add missing use of dq_data_lock in __dquot_initialize
\| * \|	ext2: Fix BUG_ON in evict() on inode deletion	Jan Kara	2013-03-13	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 8e3dffc6 introduced a regression where deleting inode with large extended attributes leads to triggering BUG_ON(inode->i_state != (I_FREEING \| I_CLEAR)) in fs/inode.c:evict(). That happens because freeing of xattr block dirtied the inode and it happened after clear_inode() has been called. Fix the issue by moving removal of xattr block into ext2_evict_inode() before clear_inode() call close to a place where data blocks are truncated. That is also more logical place and removes surprising requirement that ext2_free_blocks() mustn't dirty the inode. Reported-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
\| * \|	reiserfs: Use kstrdup instead of kmalloc/strcpy	Ionut-Gabriel Radu	2013-03-11	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Ionut-Gabriel Radu <ihonius@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>