summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'squashfs-updates' of ↵Linus Torvalds2013-11-2020-243/+1145
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs updates from Phillip Lougher: "These patches optionally improve the multi-threading peformance of Squashfs by adding parallel decompression, and direct decompression into the page cache, eliminating an intermediate buffer (removing memcpy overhead and lock contention)" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Check stream is not NULL in decompressor_multi.c Squashfs: Directly decompress into the page cache for file data Squashfs: Restructure squashfs_readpage() Squashfs: Generalise paging handling in the decompressors Squashfs: add multi-threaded decompression using percpu variable squashfs: Enhance parallel I/O Squashfs: Refactor decompressor interface and code
| * Squashfs: Check stream is not NULL in decompressor_multi.cPhillip Lougher2013-11-201-4/+3
| | | | | | | | | | | | | | | | | | Fix static checker complaint that stream is not checked in squashfs_decompressor_destroy(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * Squashfs: Directly decompress into the page cache for file dataPhillip Lougher2013-11-205-1/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces an implementation of squashfs_readpage_block() that directly decompresses into the page cache. This uses the previously added page handler abstraction to push down the necessary kmap_atomic/kunmap_atomic operations on the page cache buffers into the decompressors. This enables direct copying into the page cache without using the slow kmap/kunmap calls. The code detects when multiple threads are racing in squashfs_readpage() to decompress the same block, and avoids this regression by falling back to using an intermediate buffer. This patch enhances the performance of Squashfs significantly when multiple processes are accessing the filesystem simultaneously because it not only reduces memcopying, but it more importantly eliminates the lock contention on the intermediate buffer. Using single-thread decompression. dd if=file1 of=/dev/null bs=4096 & dd if=file2 of=/dev/null bs=4096 & dd if=file3 of=/dev/null bs=4096 & dd if=file4 of=/dev/null bs=4096 Before: 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s After: 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * Squashfs: Restructure squashfs_readpage()Phillip Lougher2013-11-204-71/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | Restructure squashfs_readpage() splitting it into separate functions for datablocks, fragments and sparse blocks. Move the memcpying (from squashfs cache entry) implementation of squashfs_readpage_block into file_cache.c This allows different implementations to be supported. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * Squashfs: Generalise paging handling in the decompressorsPhillip Lougher2013-11-2013-67/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further generalise the decompressors by adding a page handler abstraction. This adds helpers to allow the decompressors to access and process the output buffers in an implementation independant manner. This allows different types of output buffer to be passed to the decompressors, with the implementation specific aspects handled at decompression time, but without the knowledge being held in the decompressor wrapper code. This will allow the decompressors to handle Squashfs cache buffers, and page cache pages. This patch adds the abstraction and an implementation for the caches. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * Squashfs: add multi-threaded decompression using percpu variablePhillip Lougher2013-11-203-20/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: * the nature of percpu variables ensures decompression is load-balanced across the multiple cores. * simplicity. Disadvantages: it limits decompression to one thread per core. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * squashfs: Enhance parallel I/OMinchan Kim2013-11-203-1/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now squashfs have used for only one stream buffer for decompression so it hurts parallel read performance so this patch supports multiple decompressor to enhance performance parallel I/O. Four 1G file dd read on KVM machine which has 2 CPU and 4G memory. dd if=test/test1.dat of=/dev/null & dd if=test/test2.dat of=/dev/null & dd if=test/test3.dat of=/dev/null & dd if=test/test4.dat of=/dev/null & old : 1m39s -> new : 9s * From v1 * Change comp_strm with decomp_strm - Phillip * Change/add comments - Phillip Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * Squashfs: Refactor decompressor interface and codePhillip Lougher2013-11-2011-136/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decompressor interface and code was written from the point of view of single-threaded operation. In doing so it mixed a lot of single-threaded implementation specific aspects into the decompressor code and elsewhere which makes it difficult to seamlessly support multiple different decompressor implementations. This patch does the following: 1. It removes compressor_options parsing from the decompressor init() function. This allows the decompressor init() function to be dynamically called to instantiate multiple decompressors, without the compressor options needing to be read and parsed each time. 2. It moves threading and all sleeping operations out of the decompressors. In doing so, it makes the decompressors non-blocking wrappers which only deal with interfacing with the decompressor implementation. 3. It splits decompressor.[ch] into decompressor generic functions in decompressor.[ch], and moves the single threaded decompressor implementation into decompressor_single.c. The result of this patch is Squashfs should now be able to support multiple decompressors by adding new decompressor_xxx.c files with specialised implementations of the functions in decompressor_single.c Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2012-164/+52
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs bits and pieces from Al Viro: "Assorted bits that got missed in the first pull request + fixes for a couple of coredump regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold try_to_ascend() into the sole remaining caller dcache.c: get rid of pointless macros take read_seqbegin_or_lock() and friends to seqlock.h consolidate simple ->d_delete() instances gfs2: endianness misannotations dump_emit(): use __kernel_write(), not vfs_write() dump_align(): fix the dumb braino
| * | fold try_to_ascend() into the sole remaining callerAl Viro2013-11-151-31/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There used to be a bunch of tree-walkers in dcache.c, all alike. try_to_ascend() had been introduced to abstract a piece of logics duplicated in all of them. These days all these tree-walkers are implemented via the same iterator (d_walk()), which is the only remaining caller of try_to_ascend(), so let's fold it back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dcache.c: get rid of pointless macrosAl Viro2013-11-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()). At this point they are actively misguiding - they imply that values are compiler constants, which is no longer true. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | take read_seqbegin_or_lock() and friends to seqlock.hAl Viro2013-11-151-29/+0
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | consolidate simple ->d_delete() instancesAl Viro2013-11-157-78/+13
| | | | | | | | | | | | | | | | | | | | | | | | Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | gfs2: endianness misannotationsAl Viro2013-11-153-19/+16
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dump_emit(): use __kernel_write(), not vfs_write()Al Viro2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | the caller has already done file_start_write()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dump_align(): fix the dumb brainoAl Viro2013-11-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Mea culpa - original variant used 64-by-32-bit division, which got caught very late. Getting rid of that wasn't hard, but I'd managed to botch the calling conventions in process ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2013-11-201-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO fixes from Jens Axboe: "Normally I'd defer my initial for-linus pull request until after the merge window, but a race was uncovered in the virtio-blk conversion to blk-mq that could cause hangs. So here's a small collection of fixes for you to pull: - The fix for the virtio-blk IO hang reported by Dave Chinner, from Shaohua and myself. - Add the Insert blktrace event for blk-mq. This makes 'btt' happy when it is doing it's state transition analysis. - Ensure that blk-mq has disk/partition stats enabled by default, instead of making it opt-in. - A fix for __bio_add_page() and large sector counts" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add blktrace insert event trace virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations blk-mq: ensure that we set REQ_IO_STAT so diskstats work bio: fix argument of __bio_add_page() for max_sectors > 0xffff
| * | | bio: fix argument of __bio_add_page() for max_sectors > 0xffffAkinobu Mita2013-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data type of max_sectors and max_hw_sectors in queue settings are unsigned int. But these values are passed to __bio_add_page() as an argument whose data type is unsigned short. In the worst case such as max_sectors is 0x10000, bio_add_page() can't add a page and IOs can't proceed. Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-11-192-6/+20
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
| * | | | genetlink: make multicast groups const, prevent abuseJohannes Berg2013-11-191-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | genetlink: pass family to functions using groupsJohannes Berg2013-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | quota/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | genetlink: only pass array to genl_register_family_with_ops()Johannes Berg2013-11-191-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | seq_file: always clear m->count when we free m->bufAl Viro2013-11-181-1/+2
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we'd freed m->buf, m->count should become zero - we have no valid contents reachable via m->buf. Reported-by: Charley (Hao Chuan) Chu <charley.chu@broadcom.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2013-11-1617-74/+358
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull CIFS fixes from Steve French: "A set of cifs fixes most important of which is Pavel's fix for some problems with handling Windows reparse points and also the security fix for setfacl over a cifs mount to Samba removing part of the ACL. Both of these fixes are for stable as well. Also added most of copychunk (copy offload) support to cifs although I expect a final patch in that series (to fix handling of larger files) in a few days (had to hold off on that in order to incorporate some additional code review feedback). Also added support for O_DIRECT on forcedirectio mounts (needed in order to run some of the server benchmarks over cifs and smb2/smb3 mounts)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Warn if SMB3 encryption required by server setfacl removes part of ACL when setting POSIX ACLs to Samba [CIFS] Set copychunk defaults CIFS: SMB2/SMB3 Copy offload support (refcopy) phase 1 cifs: Use data structures to compute NTLMv2 response offsets [CIFS] O_DIRECT opens should work on directio mounts cifs: don't spam the logs on unexpected lookup errors cifs: change ERRnomem error mapping from ENOMEM to EREMOTEIO CIFS: Fix symbolic links usage
| * | | | [CIFS] Warn if SMB3 encryption required by serverSteve French2013-11-152-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not support SMB3 encryption yet, warn if server responds that SMB3 encryption is mandatory. Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | setfacl removes part of ACL when setting POSIX ACLs to SambaSteve French2013-11-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setfacl over cifs mounts can remove the default ACL when setting the (non-default part of) the ACL and vice versa (we were leaving at 0 rather than setting to -1 the count field for the unaffected half of the ACL. For example notice the setfacl removed the default ACL in this sequence: steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir ; setfacl -m default:user:test:rwx,user:test:rwx /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx group::r-x other::r-x default:user::rwx default:user:test:rwx default:group::r-x default:mask::rwx default:other::r-x steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx user:test:rwx group::r-x mask::rwx other::r-x CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Jeremy Allison <jra@samba.org>
| * | | | [CIFS] Set copychunk defaultsSteve French2013-11-152-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 2 of the copy chunk series (the final patch will use these to handle copies of files larger than the chunk size. We set the same defaults that Windows and Samba expect for CopyChunk. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: David Disseldorp <ddiss@samba.org>
| * | | | CIFS: SMB2/SMB3 Copy offload support (refcopy) phase 1Steve French2013-11-144-1/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This first patch adds the ability for us to do a server side copy (ie fast copy offloaded to the server to perform, aka refcopy) "cp --reflink" of one file to another located on the same server. This is much faster than traditional copy (which requires reading and writing over the network and extra memcpys). This first version is not going to be copy files larger than about 1MB (to Samba) until I add support for multiple chunks and for autoconfiguring the chunksize. It includes: 1) processing of the ioctl 2) marshalling and sending the SMB2/SMB3 fsctl over the network 3) simple parsing of the response It does not include yet (these will be in followon patches to come soon): 1) support for multiple chunks 2) support for autoconfiguring and remembering the chunksize 3) Support for the older style copychunk which Samba 4.1 server supports (because this requires write permission on the target file, which cp does not give you, apparently per-posix). This may require a distinct tool (other than cp) and other ioctl to implement. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | cifs: Use data structures to compute NTLMv2 response offsetsTim Gardner2013-11-112-17/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bit of cleanup plus some gratuitous variable renaming. I think using structures instead of numeric offsets makes this code much more understandable. Also added a comment about current time range expected by the server. Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | [CIFS] O_DIRECT opens should work on directio mountsSteve French2013-11-111-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Opens on current cifs/smb2/smb3 mounts with O_DIRECT flag fail even when caching is disabled on the mount. This was reported by those running SMB2 benchmarks who need to be able to pass O_DIRECT on many of their open calls to reduce caching effects, but would also be needed by other applications. When mounting with forcedirectio ("cache=none") cifs and smb2/smb3 do not go through the page cache and thus opens with O_DIRECT flag should work (when posix extensions are negotiated we even are able to send the flag to the server). This patch fixes that in a simple way. The 9P client has a similar situation (caching is often disabled) and takes the same approach to O_DIRECT support ie works if caching disabled, but if client caching enabled it fails with EINVAL. A followon idea for a future patch as Pavel noted, could be that files opened with O_DIRECT could cause us to change inode->i_fop on the fly from cifs_file_strict_ops to cifs_file_direct_ops which would allow us to support this on non-forcedirectio mounts (cache=strict and cache=loose) as well. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | cifs: don't spam the logs on unexpected lookup errorsJeff Layton2013-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey reported that he was seeing cifs.ko spam the logs with messages like this: CIFS VFS: Unexpected lookup error -26 He was listing the root directory of a server and hitting an error when trying to QUERY_PATH_INFO against hiberfil.sys and pagefile.sys. The right fix would be to switch the lookup code over to using FIND_FIRST, but until then we really don't need to report this at a level of KERN_ERR. Convert this message over to FYI level. Reported-by: "Andrey Shernyukov" <andreysh@nioch.nsc.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | cifs: change ERRnomem error mapping from ENOMEM to EREMOTEIOJeff Layton2013-11-112-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes, the server will report an error that basically indicates that it's running out of resources. These include these under SMB1: NT_STATUS_NO_MEMORY NT_STATUS_SECTION_TOO_BIG NT_STATUS_TOO_MANY_PAGING_FILES ...and this one under SMB2: STATUS_NO_MEMORY Currently, this gets mapped to ENOMEM by the client, but that's confusing as an ENOMEM error is typically an indicator that the client is out of memory. Change these errors to instead map to EREMOTEIO to indicate that the problem is actually server-side and not on the client. Reported-by: "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | CIFS: Fix symbolic links usagePavel Shilovsky2013-11-116-49/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we treat any reparse point as a symbolic link and map it to a Unix one that is not true in a common case due to many reparse point types supported by SMB servers. Distinguish reparse point types into two groups: 1) that can be accessed directly through a reparse point (junctions, deduplicated files, NFS symlinks); 2) that need to be processed manually (Windows symbolic links, DFS); and map only Windows symbolic links to Unix ones. Cc: <stable@vger.kernel.org> Acked-by: Jeff Layton <jlayton@redhat.com> Reported-and-tested-by: Joao Correia <joaomiguelcorreia@gmail.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
* | | | | Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-11-163-5/+10
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is * tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix pnfs Kconfig defaults NFS: correctly report misuse of "migration" mount option. nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once SUNRPC: Avoid deep recursion in rpc_release_client SUNRPC: Fix a data corruption issue when retransmitting RPC calls
| * | | | | nfs: fix pnfs Kconfig defaultsChristoph Hellwig2013-11-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Defaulting to m seem to prevent building the pnfs layout modules into the kernel. Default to the value of CONFIG_NFS_V4 make sure they are built in for built-in NFSv4 support and modular for a modular NFSv4. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | NFS: correctly report misuse of "migration" mount option.NeilBrown2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current test on valid use of the "migration" mount option can never report an error as it will only do so if mnt->version !=4 && mnt->minor_version != 0 (and some other condition), but if that test would succeed, then the previous test has already gone-to out_minorversion_mismatch. So change the && to an || to get correct semantics. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than onceJeff Layton2013-11-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again. There is no guarantee that doing so will work however, so we can end up retrying the call in an infinite loop. Worse yet, we create the new client using rpc_clone_client_set_auth, which creates the new client as a child of the old one. Thus, we can end up with a *very* long lineage of rpc_clnts. When we go to put all of the references to them, we can end up with a long call chain that can smash the stack as each rpc_free_client() call can recurse back into itself. This patch fixes this by simply ensuring that the SETCLIENTID call will only be retried in this situation if the last attempt did not use RPC_AUTH_UNIX. Note too that with this change, we don't need the (i > 2) check in the -EACCES case since we now have a more reliable test as to whether we should reattempt. Cc: stable@vger.kernel.org # v3.10+ Cc: Chuck Lever <chuck.lever@oracle.com> Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | | | Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-11-165-77/+115
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd changes from Bruce Fields: "This includes miscellaneous bugfixes and cleanup and a performance fix for write-heavy NFSv4 workloads. (The most significant nfsd-relevant change this time is actually in the delegation patches that went through Viro, fixing a long-standing bug that can cause NFSv4 clients to miss updates made by non-nfs users of the filesystem. Those enable some followup nfsd patches which I have queued locally, but those can wait till 3.14)" * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (24 commits) nfsd: export proper maximum file size to the client nfsd4: improve write performance with better sendspace reservations svcrpc: remove an unnecessary assignment sunrpc: comment typo fix Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation" nfsd4: fix discarded security labels on setattr NFSD: Add support for NFS v4.2 operation checking nfsd4: nfsd_shutdown_net needs state lock NFSD: Combine decode operations for v4 and v4.1 nfsd: -EINVAL on invalid anonuid/gid instead of silent failure nfsd: return better errors to exportfs nfsd: fh_update should error out in unexpected cases nfsd4: need to destroy revoked delegations in destroy_client nfsd: no need to unhash_stid before free nfsd: remove_stid can be incorporated into nfs4_put_delegation nfsd: nfs4_open_delegation needs to remove_stid rather than unhash_stid nfsd: nfs4_free_stid nfsd: fix Kconfig syntax sunrpc: trim off EC bytes in GSSAPI v2 unwrap gss_krb5: document that we ignore sequence number ...
| * | | | | | nfsd: export proper maximum file size to the clientChristoph Hellwig2013-11-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed that we export a way to high value for the maxfilesize attribute when debugging a client issue. The issue didn't turn out to be related to it, but I think we should export it, so that clients can limit what write sizes they accept before hitting the server. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: improve write performance with better sendspace reservationsJ. Bruce Fields2013-11-131-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the rpc code conservatively refuses to accept rpc's from a client if the sum of its worst-case estimates of the replies it owes that client exceed the send buffer space. Unfortunately our estimate of the worst-case reply for an NFSv4 compound is always the maximum read size. This can unnecessarily limit the number of operations we handle concurrently, for example in the case most operations are writes (which have small replies). We can do a little better if we check which ops the compound contains. This is still a rough estimate, we'll need to improve on it some day. Reported-by: Shyam Kaushik <shyamnfs1@gmail.com> Tested-by: Shyam Kaushik <shyamnfs1@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | Revert "nfsd: remove_stid can be incorporated into nfs4_put_delegation"J. Bruce Fields2013-11-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 7ebe40f20372688a627ad6c754bc0d1c05df58a9. We forgot the nfs4_put_delegation call in fs/nfsd/nfs4callback.c which should not be unhashing the stateid. This lead to warnings from the idr code when we tried to removed id's twice. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: fix discarded security labels on setattrJ. Bruce Fields2013-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Security labels in setattr calls are currently ignored because we forget to set label->len. Cc: stable@vger.kernel.org Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | NFSD: Add support for NFS v4.2 operation checkingAnna Schumaker2013-10-301-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The server does allow NFS over v4.2, even if it doesn't add any new operations yet. I also switch to using constants to represent the last operation for each minor version since this makes the code cleaner and easier to understand at a quick glance. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: nfsd_shutdown_net needs state lockJ. Bruce Fields2013-10-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A comment claims the caller should take it, but that's not being done. Note we don't want it around the cancel_delayed_work_sync since that may wait on work which holds the client lock. Reported-by: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | NFSD: Combine decode operations for v4 and v4.1Anna Schumaker2013-10-301-58/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were using a different array of function pointers to represent each minor version. This makes adding a new minor version tedious, since it needs a step to copy, paste and modify a new version of the same functions. This patch combines the v4 and v4.1 arrays into a single instance and will check minor version support inside each decoder function. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: -EINVAL on invalid anonuid/gid instead of silent failureJ. Bruce Fields2013-10-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we're going to refuse to accept these it would be polite of us to at least say so.... This introduces a slight complication since we need to grandfather in exportfs's ill-advised use of -1 uid and gid on its test_export. If it turns out there are other users passing down -1 we may need to do something else. Best might be to drop the checks entirely, but I'm not sure if other parts of the kernel might assume that a task can't run as uid or gid -1. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: return better errors to exportfsJ. Bruce Fields2013-10-291-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Someone noticed exportfs happily accepted exports that would later be rejected when mountd tried to give them to the kernel. Fix this. This is a regression from 4c1e1b34d5c800ad3ac9a7e2805b0bea70ad2278 "nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids". Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Reported-by: Yin.JianHong <jiyin@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: fh_update should error out in unexpected casesJ. Bruce Fields2013-10-291-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reporter saw a NULL dereference when a filesystem's ->mknod returned success but left the dentry negative, and then nfsd tried to dereference d_inode (in this case because the CREATE was followed by a GETATTR in the same nfsv4 compound). fh_update already checks for this and another broken case, but for some reason it returns success and leaves nfsd trying to soldier on. If it failed we'd avoid the crash. There's only so much we can do with a buggy filesystem, but it's easy enough to bail out here, so let's do that. Reported-by: Antti Tönkyrä <daedalus@pingtimeout.net> Tested-by: Antti Tönkyrä <daedalus@pingtimeout.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd4: need to destroy revoked delegations in destroy_clientBenny Halevy2013-10-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [use list_splice_init] Signed-off-by: Benny Halevy <bhalevy@primarydata.com> [bfields: no need for recall_lock here] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
OpenPOWER on IntegriCloud