op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	xfs: fix tmpfile/selinux deadlock and initialize security	Brian Foster	2014-04-17	3	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xfstests generic/004 reproduces an ilock deadlock using the tmpfile interface when selinux is enabled. This occurs because xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The latter eventually calls into xfs_xattr_get() which attempts to get the lock again. E.g.: xfs_io D ffffffff81c134c0 4096 3561 3560 0x00000080 ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8 00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540 ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480 Call Trace: [<ffffffff8177f969>] schedule+0x29/0x70 [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120 [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0 [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs] [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs] [<ffffffff8124842f>] generic_getxattr+0x4f/0x70 [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650 [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20 [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30 [<ffffffff81237db0>] d_instantiate+0x50/0x70 [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0 [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs] [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs] [<ffffffff81230388>] path_openat+0x228/0x6a0 [<ffffffff810230f9>] ? sched_clock+0x9/0x10 [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8123101a>] do_filp_open+0x3a/0x90 [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210 [<ffffffff8121e4ce>] SyS_open+0x1e/0x20 [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b xfs_vn_tmpfile() also fails to initialize security on the newly created inode. Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction has been committed and the inode unlocked. Also, initialize security on the inode based on the parent directory provided via the tmpfile call. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: fix buffer use after free on IO error	Eric Sandeen	2014-04-17	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When testing exhaustion of dm snapshots, the following appeared with CONFIG_DEBUG_OBJECTS_FREE enabled: ODEBUG: free active (active state 0) object type: work_struct hint: xfs_buf_iodone_work+0x0/0x1d0 [xfs] indicating that we'd freed a buffer which still had a pending reference, down this path: [ 190.867975] [<ffffffff8133e6fb>] debug_check_no_obj_freed+0x22b/0x270 [ 190.880820] [<ffffffff811da1d0>] kmem_cache_free+0xd0/0x370 [ 190.892615] [<ffffffffa02c5924>] xfs_buf_free+0xe4/0x210 [xfs] [ 190.905629] [<ffffffffa02c6167>] xfs_buf_rele+0xe7/0x270 [xfs] [ 190.911770] [<ffffffffa034c826>] xfs_trans_read_buf_map+0x7b6/0xac0 [xfs] At issue is the fact that if IO fails in xfs_buf_iorequest, we'll queue completion unconditionally, and then call xfs_buf_rele; but if IO failed, there are no IOs remaining, and xfs_buf_rele will free the bp while work is still queued. Fix this by not scheduling completion if the buffer has an error on it; run it immediately. The rest is only comment changes. Thanks to dchinner for spotting the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: wrong error sign conversion during failed DIO writes	Dave Chinner	2014-04-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We negate the error value being returned from a generic function incorrectly. The code path that it is running in returned negative errors, so there is no need to negate it to get the correct error signs here. This was uncovered by generic/019. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: unmount does not wait for shutdown during unmount	Dave Chinner	2014-04-17	1	-9/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And interesting situation can occur if a log IO error occurs during the unmount of a filesystem. The cases reported have the same signature - the update of the superblock counters fails due to a log write IO error: XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1 XFS (dm-16): Log I/O Error Detected. Shutting down filesystem XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount. XFS (dm-16): xfs_log_force: error 5 returned. XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s) It can be seen that the last line of output contains a corrupt device name - this is because the log and xfs_mount structures have already been freed by the time this message is printed. A kernel oops closely follows. The issue is that the shutdown is occurring in a separate IO completion thread to the unmount. Once the shutdown processing has started and all the iclogs are marked with XLOG_STATE_IOERROR, the log shutdown code wakes anyone waiting on a log force so they can process the shutdown error. This wakes up the unmount code that is doing a synchronous transaction to update the superblock counters. The unmount path now sees all the iclogs are marked with XLOG_STATE_IOERROR and so never waits on them again, knowing that if it does, there will not be a wakeup trigger for it and we will hang the unmount if we do. Hence the unmount runs through all the remaining code and frees all the filesystem structures while the xlog_iodone() is still processing the shutdown. When the log shutdown processing completes, xfs_do_force_shutdown() emits the "Please umount the filesystem and rectify the problem(s)" message, and xlog_iodone() then aborts all the objects attached to the iclog. An iclog that has already been freed.... The real issue here is that there is no serialisation point between the log IO and the unmount. We have serialisations points for log writes, log forces, reservations, etc, but we don't actually have any code that wakes for log IO to fully complete. We do that for all other types of object, so why not iclogbufs? Well, it turns out that we can easily do this. We've got xfs_buf handles, and that's what everyone else uses for IO serialisation. i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only release the lock in xlog_iodone() when we are finished with the buffer. That way before we tear down the iclog, we can lock and unlock the buffer to ensure IO completion has finished completely before we tear it down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bob Mastors <bob.mastors@solidfire.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: collapse range is delalloc challenged	Dave Chinner	2014-04-17	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FSX has been detecting data corruption after to collapse range calls. The key observation is that the offset of the last extent in the file was not being shifted, and hence when the file size was adjusted it was truncating away data because the extents handled been correctly shifted. Tracing indicated that before the collapse, the extent list looked like: .... ino 0x5788 state idx 6 offset 26 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 39 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 and after the shift of 2 blocks: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 32 flag 0 Note that the last extent did not change offset. After the changing of the file size: ino 0x5788 state idx 6 offset 24 block 195904 count 10 flag 0 ino 0x5788 state idx 7 offset 37 block 195917 count 35 flag 0 ino 0x5788 state idx 8 offset 86 block 195964 count 30 flag 0 You can see that the last extent had it's length truncated, indicating that we've lost data. The reason for this is that the xfs_bmap_shift_extents() loop uses XFS_IFORK_NEXTENTS() to determine how many extents are in the inode. This, unfortunately, doesn't take into account delayed allocation extents - it's a count of physically allocated extents - and hence when the file being collapsed has a delalloc extent like this one does prior to the range being collapsed: .... ino 0x5788 state idx 4 offset 11 block 4503599627239429 count 1 flag 0 .... it gets the count wrong and terminates the shift loop early. Fix it by using the in-memory extent array size that includes delayed allocation extents to determine the number of extents on the inode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: don't map ranges that span EOF for direct IO	Dave Chinner	2014-04-17	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Al Viro tracked down the problem that has caused generic/263 to fail on XFS since the test was introduced. If is caused by xfs_get_blocks() mapping a single extent that spans EOF without marking it as buffer-new() so that the direct IO code does not zero the tail of the block at the new EOF. This is a long standing bug that has been around for many, many years. Because xfs_get_blocks() starts the map before EOF, it can't set buffer_new(), because that causes he direct IO code to also zero unaligned sectors at the head of the IO. This would overwrite valid data with zeros, and hence we cannot validly return a single extent that spans EOF to direct IO. Fix this by detecting a mapping that spans EOF and truncate it down to EOF. This results in the the direct IO code doing the right thing for unaligned data blocks before EOF, and then returning to get another mapping for the region beyond EOF which XFS treats correctly by setting buffer_new() on it. This makes direct Io behave correctly w.r.t. tail block zeroing beyond EOF, and fsx is happy about that. Again, thanks to Al Viro for finding what I couldn't. [ dchinner: Fix for __divdi3 build error: Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: zeroing space needs to punch delalloc blocks	Dave Chinner	2014-04-14	2	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we are zeroing space andit is covered by a delalloc range, we need to punch the delalloc range out before we truncate the page cache. Failing to do so leaves and inconsistency between the page cache and the extent tree, which we later trip over when doing direct IO over the same range. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: xfs_vm_write_end truncates too much on failure	Dave Chinner	2014-04-14	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to the write_begin problem, xfs-vm_write_end will truncate back to the old EOF, potentially removing page cache from over the top of delalloc blocks with valid data in them. Fix this by truncating back to just the start of the failed write. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: write failure beyond EOF truncates too much data	Dave Chinner	2014-04-14	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we fail a write beyond EOF and have to handle it in xfs_vm_write_begin(), we truncate the inode back to the current inode size. This doesn't take into account the fact that we may have already made successful writes to the same page (in the case of block size < page size) and hence we can truncate the page cache away from blocks with valid data in them. If these blocks are delayed allocation blocks, we now have a mismatch between the page cache and the extent tree, and this will trigger - at minimum - a delayed block count mismatch assert when the inode is evicted from the cache. We can also trip over it when block mapping for direct IO - this is the most common symptom seen from fsx and fsstress when run from xfstests. Fix it by only truncating away the exact range we are updating state for in this write_begin call. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	xfs: kill buffers over failed write ranges properly	Dave Chinner	2014-04-14	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a write fails, if we don't clear the delalloc flags from the buffers over the failed range, they can persist beyond EOF and cause problems. writeback will see the pages in the page cache, see they are dirty and continually retry the write, assuming that the page beyond EOF is just racing with a truncate. The page will eventually be released due to some other operation (e.g. direct IO), and it will not pass through invalidation because it is dirty. Hence it will be released with buffer_delay set on it, and trigger warnings in xfs_vm_releasepage() and assert fail in xfs_file_aio_write_direct because invalidation failed and we didn't write the corect amount. This causes failures on block size < page size filesystems in fsx and fsstress workloads run by xfstests. Fix it by completely trashing any state on the buffer that could be used to imply that it contains valid data when the delalloc range over the buffer is punched out during the failed write handling. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
*	cifs: Use min_t() when comparing "size_t" and "unsigned long"	Geert Uytterhoeven	2014-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 32 bit, size_t is "unsigned int", not "unsigned long", causing the following warning when comparing with PAGE_SIZE, which is always "unsigned long": fs/cifs/file.c: In function ‘cifs_readdata_to_iov’: fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()"), which changed the signedness of "remaining" and the code from min_t() to min(). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	2014-04-12	5	-14/+13
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull yet more networking updates from David Miller: 1) Various fixes to the new Redpine Signals wireless driver, from Fariya Fatima. 2) L2TP PPP connect code takes PMTU from the wrong socket, fix from Dmitry Petukhov. 3) UFO and TSO packets differ in whether they include the protocol header in gso_size, account for that in skb_gso_transport_seglen(). From Florian Westphal. 4) If VLAN untagging fails, we double free the SKB in the bridging output path. From Toshiaki Makita. 5) Several call sites of sk->sk_data_ready() were referencing an SKB just added to the socket receive queue in order to calculate the second argument via skb->len. This is dangerous because the moment the skb is added to the receive queue it can be consumed in another context and freed up. It turns out also that none of the sk->sk_data_ready() implementations even care about this second argument. So just kill it off and thus fix all these use-after-free bugs as a side effect. 6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti. 7) pktgen needs to do locking properly for LLTX devices, from Daniel Borkmann. 8) xen-netfront driver initializes TX array entries in RX loop :-) From Vincenzo Maffione. 9) After refactoring, some tunnel drivers allow a tunnel to be configured on top itself. Fix from Nicolas Dichtel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) vti: don't allow to add the same tunnel twice gre: don't allow to add the same tunnel twice drivers: net: xen-netfront: fix array initialization bug pktgen: be friendly to LLTX devices r8152: check RTL8152_UNPLUG net: sun4i-emac: add promiscuous support net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO net: ipv6: Fix oif in TCP SYN+ACK route lookup. drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts drivers: net: cpsw: discard all packets received when interface is down net: Fix use after free by removing length arg from sk_data_ready callbacks. Drivers: net: hyperv: Address UDP checksum issues Drivers: net: hyperv: Negotiate suitable ndis version for offload support Drivers: net: hyperv: Allocate memory for all possible per-pecket information bridge: Fix double free and memory leak around br_allowed_ingress bonding: Remove debug_fs files when module init fails i40evf: program RSS LUT correctly i40evf: remove open-coded skb_cow_head ixgb: remove open-coded skb_cow_head igbvf: remove open-coded skb_cow_head ...
\| *	net: Fix use after free by removing length arg from sk_data_ready callbacks.	David S. Miller	2014-04-11	5	-14/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ceph: fix pr_fmt() redefinition	Linus Torvalds	2014-04-12	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The vfs merge caused a latent bug to show up: In file included from fs/ceph/super.h:4:0, from fs/ceph/ioctl.c:3: include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default] #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt ^ In file included from include/linux/kernel.h:13:0, from include/linux/uio.h:12, from include/linux/socket.h:7, from include/uapi/linux/in.h:22, from include/linux/in.h:23, from fs/ceph/ioctl.c:1: include/linux/printk.h:214:0: note: this is the location of the previous definition #define pr_fmt(fmt) fmt ^ where the reason is that <linux/ceph_debug.h> is included much too late for the "pr_fmt()" define. The include of <linux/ceph_debug.h> needs to be the first include in the file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't noticeable until some unrelated header file changes brought in an indirect earlier include of <linux/kernel.h>. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2014-04-12	34	-699/+385
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it does shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
\| * \|	cifs: fix the race in cifs_writev()	Al Viro	2014-04-12	1	-5/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	O_APPEND handling there hadn't been completely fixed by Pavel's patch; it checks the right value, but it's racy - we can't really do that until i_mutex has been taken. Fix by switching to __generic_file_aio_write() (open-coding generic_file_aio_write(), actually) and pulling mutex_lock() above inode_size_read(). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure	Al Viro	2014-04-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ceph_osdc_put_request(ERR_PTR(-error)) oopses. What we want there is break, not goto out. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ocfs2_file_aio_write(): switch to generic_perform_write()	Al Viro	2014-04-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ceph_aio_write(): switch to generic_perform_write()	Al Viro	2014-04-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	xfs_file_buffered_aio_write(): switch to generic_perform_write()	Al Viro	2014-04-01	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	generic_file_direct_write(): get rid of ppos argument	Al Viro	2014-04-01	4	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	always equal to &iocb->ki_pos. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	btrfs_file_aio_write(): get rid of ppos	Al Viro	2014-04-01	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	kill the 5th argument of generic_file_buffered_write()	Al Viro	2014-04-01	3	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	same story - it's &iocb->ki_pos in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	kill the 4th argument of __generic_file_aio_write()	Al Viro	2014-04-01	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's always equal to &iocb->ki_pos, where iocb is the value of the 1st argument. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ocfs2: don't open-code kernel_recvmsg()	Al Viro	2014-04-01	1	-18/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	constify blk_rq_map_user_iov() and friends	Al Viro	2014-04-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sg_iovec array passed to it can be const Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ocfs2: don't open-code kernel_sendmsg()	Al Viro	2014-04-01	1	-20/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	read_code(): go through vfs_read() instead of calling the method directly	Al Viro	2014-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... and don't skip on sanity checks. It's not a hot path, TYVM (a couple of calls per a.out execve(), for pity sake) and headers of random a.out binary are not to be trusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	fold cifs_iovec_read() into its (only) caller	Al Viro	2014-04-01	1	-18/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()	Al Viro	2014-04-01	1	-45/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... we are doing them on adjacent parts of file, so what happens is that each subsequent call works to rebuild the iov_iter to exact state it had been abandoned in by previous one. Just keep it through the entire cifs_iovec_read(). And use copy_page_to_iter() instead of doing kmap/copy_to_user/kunmap manually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	switch vmsplice_to_user() to copy_page_to_iter()	Al Viro	2014-04-01	1	-89/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've switched the sanity checks on iovec to rw_copy_check_uvector(); we might need to do a local analog, if any behaviour differences are not actually bugfixes here... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	switch pipe_read() to copy_page_to_iter()	Al Viro	2014-04-01	1	-71/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	cifs_iovec_read(): resubmit shouldn't restart the loop	Al Viro	2014-04-01	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... by that point the request we'd just resent is in the head of the list anyway. Just return to the beginning of the loop body... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	callers of iov_copy_from_user_atomic() don't need pagecache_disable()	Al Viro	2014-04-01	2	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... it does that itself (via kmap_atomic()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	switch ->is_partially_uptodate() to saner arguments	Al Viro	2014-04-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	pipe: kill ->map() and ->unmap()	Al Viro	2014-04-01	3	-73/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	fuse/dev: use atomic maps	Al Viro	2014-04-01	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	VFS: Make delayed_free() call free_vfsmnt()	David Howells	2014-04-01	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make delayed_free() call free_vfsmnt() so that we don't have two functions doing the same job. This requires the calls to mnt_free_id() in free_vfsmnt() to be moved into the callers of that function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	cifs: ->rename() without ->lookup() makes no sense	Al Viro	2014-04-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	get rid of pointless checks for NULL ->i_op	Al Viro	2014-04-01	2	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	ntfs: don't put NULL into ->i_op/->i_fop	Al Viro	2014-04-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	new helper: readlink_copy()	Al Viro	2014-04-01	4	-46/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	get rid of files_defer_init()	Al Viro	2014-04-01	2	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the only thing it's doing these days is calculation of upper limit for fs.nr_open sysctl and that can be done statically Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	namei.c: move EXPORT_SYMBOL to corresponding definitions	Al Viro	2014-04-01	1	-28/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	get_write_access() is inlined, exporting it is pointless	Al Viro	2014-04-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	tidy do_dentry_open() up a bit	Al Viro	2014-04-01	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	mark struct file that had write access grabbed by open()	Al Viro	2014-04-01	3	-41/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case when it has grabbed write access, checked by __fput() to decide whether it wants to drop the sucker. Allows to stop bothering with mnt_clone_write() in alloc_file(), along with fewer special_file() checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	fold __get_file_write_access() into its only caller	Al Viro	2014-04-01	1	-19/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	get rid of DEBUG_WRITECOUNT	Al Viro	2014-04-01	2	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it only makes control flow in __fput() and friends more convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| * \|	don't bother with {get,put}_write_access() on non-regular files	Al Viro	2014-04-01	2	-21/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it's pointless and actually leads to wrong behaviour in at least one moderately convoluted case (pipe(), close one end, try to get to another via /proc/*/fd and run into ETXTBUSY). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>