summaryrefslogtreecommitdiffstats
path: root/include/linux/sunrpc
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2015-11-111-5/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Apologies for coming a little late in the merge window. Fortunately this is another fairly quiet one: Mainly smaller bugfixes and cleanup. We're still finding some bugs from the breakup of the big NFSv4 state lock in 3.17 -- thanks especially to Andrew Elble and Jeff Layton for tracking down some of the remaining races" * tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux: svcrpc: document lack of some memory barriers nfsd: fix race with open / open upgrade stateids nfsd: eliminate sending duplicate and repeated delegations nfsd: remove recurring workqueue job to clean DRC SUNRPC: drop stale comment in svc_setup_socket() nfsd: ensure that seqid morphing operations are atomic wrt to copies nfsd: serialize layout stateid morphing operations nfsd: improve client_has_state to check for unused openowners nfsd: fix clid_inuse on mount with security change sunrpc/cache: make cache flushing more reliable. nfsd: move include of state.h from trace.c to trace.h sunrpc: avoid warning in gss_key_timeout lockd: get rid of reference-counted NSM RPC clients SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage() lockd: create NSM handles per net namespace nfsd: switch unsigned char flags in svc_fh to bools nfsd: move svc_fh->fh_maxsize to just after fh_handle nfsd: drop null test before destroy functions nfsd: serialize state seqid morphing operations
| * sunrpc/cache: make cache flushing more reliable.Neil Brown2015-10-231-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The caches used to store sunrpc authentication information can be flushed by writing a timestamp to a file in /proc. This timestamp has a one-second resolution and any entry in cache that was last_refreshed *before* that time is treated as expired. This is problematic as it is not possible to reliably flush the cache without interrupting NFS service. If the current time is written to the "flush" file, any entry that was added since the current second started will still be treated as valid. If one second beyond than the current time is written to the file then no entries can be valid until the second ticks over. This will mean that no NFS request will be handled for up to 1 second. To resolve this issue we make two changes: 1/ treat an entry as expired if the timestamp when it was last_refreshed is before *or the same as* the expiry time. This means that current code which writes out the current time will now flush the cache reliably. 2/ when a new entry in added to the cache - set the last_refresh timestamp to 1 second *beyond* the current flush time, when that not in the past. This ensures that newly added entries will always be valid. Now that we have a very reliable way to flush the cache, and also since we are using "since-boot" timestamps which are monotonic, change cache_purge() to set the smallest future flush_time which will work, and leave it there: don't revert to '1'. Also disable the setting of the 'flush_time' far into the future. That has never been useful and is now awkward as it would cause last_refresh times to be strange. Finally: if a request is made to set the 'flush_time' to the current second, assume the intent is to flush the cache and advance it, if necessary, to 1 second beyond the current 'flush_time' so that all active entries will be deemed to be expired. As part of this we need to add a 'cache_detail' arg to cache_init() and cache_fresh_locked() so they can find the current ->flush_time. Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Olaf Kirch <okir@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-11-094-1/+21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: New features: - RDMA client backchannel from Chuck - Support for NFSv4.2 file CLONE using the btrfs ioctl Bugfixes + cleanups: - Move socket data receive out of the bottom halves and into a workqueue - Refactor NFSv4 error handling so synchronous and asynchronous RPC handles errors identically. - Fix a panic when blocks or object layouts reads return a bad data length - Fix nfsroot so it can handle a 1024 byte long path. - Fix bad usage of page offset in bl_read_pagelist - Various NFSv4 callback cleanups+fixes - Fix GETATTR bitmap verification - Support hexadecimal number for sunrpc debug sysctl files" * tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits) Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug nfs: Fix GETATTR bitmap verification nfs: Remove unused xdr page offsets in getacl/setacl arguments fs/nfs: remove unnecessary new_valid_dev check SUNRPC: fix variable type NFS: Enable client side NFSv4.1 backchannel to use other transports pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors SUNRPC: Remove the TCP-only restriction in bc_svc_process() svcrdma: Add backward direction service for RPC/RDMA transport xprtrdma: Handle incoming backward direction RPC calls xprtrdma: Add support for sending backward direction RPC replies xprtrdma: Pre-allocate Work Requests for backchannel xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers SUNRPC: Abstract backchannel operations xprtrdma: Saving IRQs no longer needed for rb_lock xprtrdma: Remove reply tasklet xprtrdma: Use workqueue to process RPC/RDMA replies xprtrdma: Replace send and receive arrays xprtrdma: Refactor reply handler error handling ...
| * \ Merge tag 'nfs-rdma-4.4-2' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2015-11-023-1/+19
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS: NFSoRDMA Client Side Changes In addition to a variety of bugfixes, these patches are mostly geared at enabling both swap and backchannel support to the NFS over RDMA client. Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>
| | * | NFS: Enable client side NFSv4.1 backchannel to use other transportsChuck Lever2015-11-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | svcrdma: Add backward direction service for RPC/RDMA transportChuck Lever2015-11-022-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | SUNRPC: Abstract backchannel operationsChuck Lever2015-11-022-0/+10
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Move TCP receive data path into a workqueue contextTrond Myklebust2015-10-081-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | svcrdma: Port to new memory registration APISagi Grimberg2015-10-281-4/+2
|/ | | | | | | | | | | | Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* SUNRPC: Ensure that we wait for connections to complete before retryingTrond Myklebust2015-09-171-0/+3
| | | | | | | | | | | | | | | | Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge tag 'for-linus' of ↵Linus Torvalds2015-09-091-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
| * svcrdma: Use max_sge_rd for destination read depthsSteve Wise2015-08-281-0/+1
| | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-09-073-10/+27
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...
| * | SUNRPC: Drop double-underscores from __rpc_cmp_addr6()Trond Myklebust2015-08-171-1/+1
| | | | | | | | | | | | | | | | | | Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 7b0ce60c0b20 ("SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | Merge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2015-08-171-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS: NFS over RDMA Client Side Changes These patches improve both client performance and scalability, most notably by increasing the maixmum allowed rsize and wsize and by increasing the number of RDMA "credits". There are also several bugfixes, such as correcting how WRITE compounds are encoded and fixing large NFS symlink operations. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| | * | xprtrdma: Increase default credit limitChuck Lever2015-08-051-1/+1
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for similar increases on NFS/RDMA servers, bump the advertised credit limit for RPC/RDMA to 128. This allocates some extra resources, but the client will continue to allow only the number of RPCs in flight that the server requests via its advertised credit limit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Add an rpc_cmp_addr_port() functionAnna Schumaker2015-08-171-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | This function is to help determine if two sockaddrs are really the same socket. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()Anna Schumaker2015-08-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | I'm planning on using these functions inside the client, so remove the underscores to make it feel like I'm using a public interface. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | sunrpc: increase UNX_MAXNODENAME from 32 to __NEW_UTS_LEN bytesJeff Layton2015-08-121-2/+6
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current limit of 32 bytes artificially limits the name string that we end up stuffing into NFSv4.x client ID blobs. If you have multiple hosts with long hostnames that only differ near the end, then this can cause NFSv4 client ID collisions. Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use that as the limit instead. Also, use XDR_QUADLEN to specify the slack length, just for clarity and in case someone in the future changes this to something not evenly divisible by 4. Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | sunrpc: Switch to using hash list instead single listKinglong Mee2015-08-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | Switch using list_head for cache_head in cache_detail, it is useful of remove an cache_head entry directly from cache_detail. v8, using hash list, not head list Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc/nfsd: Remove redundant code by exports seq_operations functionsKinglong Mee2015-08-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | Nfsd has implement a site of seq_operations functions as sunrpc's cache. Just exports sunrpc's codes, and remove nfsd's redundant codes. v8, same as v6 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: factor svc_rqst allocation and freeing from sv_nrthreads ↵Jeff Layton2015-08-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | refcounting In later patches, we'll want to be able to allocate and free svc_rqst structures without monkeying with the serv->sv_nrthreads refcount. Factor those pieces out of their respective functions. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: move pool_mode definitions into svc.hJeff Layton2015-08-101-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In later patches, we're going to need to allow code external to svc.c to figure out what pool_mode is in use. Move these definitions into svc.h to prepare for that. Also, make the svc_pool_map object available and exported so that other modules can peek in there to get insight into what pool mode is in use. Likewise, export svc_pool_map_get/put function to make it safe to do so. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: abstract out svc_set_num_threads to sv_opsJeff Layton2015-08-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | Add an operation that will do setup of the service. In the case of a classic thread-based service that means starting up threads. In the case of a workqueue-based service, the setup will do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirliey.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operationJeff Layton2015-08-102-0/+4
| | | | | | | | | | | | | | | | | | | | For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: move sv_module parm into sv_opsJeff Layton2015-08-101-3/+6
| | | | | | | | | | | | | | | | | | | | ...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: move sv_function into sv_opsJeff Layton2015-08-101-8/+3
| | | | | | | | | | | | | | | | | | | | Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into itJeff Layton2015-08-101-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Jeff Layton <jlayton@primarydata.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOADChuck Lever2015-08-101-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both commit 0380a3f375 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be28827bf ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be28827bf based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f375 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f375 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Remove svc_rdma_fastreg()Chuck Lever2015-07-201-1/+0
| | | | | | | | | | | | | | | | Commit 0bf4828983df ("svcrdma: refactor marshalling logic") removed the last call site for svc_rdma_fastreg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Clean up svc_rdma_get_reply_array()Chuck Lever2015-07-201-79/+2
|/ | | | | | | | | | | | | | | Kernel coding conventions frown upon having large nontrivial functions in header files, and the preference these days is to allow the compiler to make inlining decisions if possible. As these functions are re-homed into a .c file, be sure that comparisons with fields in struct rpcrdma_msg are with be32 constants. This is a refactoring change; no behavior change is intended. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Merge tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-07-025-7/+56
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix a crash in the NFSv4 file locking code. - Fix an fsync() regression, where we were failing to retry I/O in some circumstances. - Fix an infinite loop in NFSv4.0 OPEN stateid recovery - Fix a memory leak when an attempted pnfs fails. - Fix a memory leak in the backchannel code - Large hostnames were not supported correctly in NFSv4.1 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O. - Fix a couple of credential issues in pNFS/flexfiles Bugfixes + cleanups: - Open flag sanity checks in the NFSv4 atomic open codepath - More NFSv4 delegation related bugfixes - Various NFSv4.1 backchannel bugfixes and cleanups - Fix the NFS swap socket code - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code - Fix a UDP transport deadlock issue Features: - More RDMA client transport improvements - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles" * tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits) nfs: Remove invalid tk_pid from debug message nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh nfs: Drop bad comment in nfs41_walk_client_list() nfs: Remove unneeded micro checking of CONFIG_PROC_FS nfs: Don't setting FILE_CREATED flags always nfs: Use remove_proc_subtree() instead remove_proc_entry() nfs: Remove unused argument in nfs_server_set_fsinfo() nfs: Fix a memory leak when meeting an unsupported state protect nfs: take extra reference to fl->fl_file when running a LOCKU operation NFSv4: When returning a delegation, don't reclaim an incompatible open mode. NFSv4.2: LAYOUTSTATS is optional to implement NFSv4.2: Fix up a decoding error in layoutstats pNFS/flexfiles: Fix the reset of struct pgio_header when resending pNFS/flexfiles: Turn off layoutcommit for servers that don't need it pnfs/flexfiles: protect ktime manipulation with mirror lock nfs: provide pnfs_report_layoutstat when NFS42 is disabled nfs: verify open flags before allowing open nfs: always update creds in mirror, even when we have an already connected ds nfs: fix potential credential leak in ff_layout_update_mirror_cred pnfs/flexfiles: report layoutstat regularly ...
| * Merge tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2015-06-161-1/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS: NFSoRDMA Client Changes These patches continue to build up for improving the rsize and wsize that the NFS client uses when talking over RDMA. In addition, these patches also add in scalability enhancements and other bugfixes. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (142 commits) xprtrdma: Reduce per-transport MR allocation xprtrdma: Stack relief in fmr_op_map() xprtrdma: Split rb_lock xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy xprtrdma: Remove ->ro_reset xprtrdma: Remove unused LOCAL_INV recovery logic xprtrdma: Acquire MRs in rpcrdma_register_external() xprtrdma: Introduce an FRMR recovery workqueue xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() xprtrdma: Introduce helpers for allocating MWs xprtrdma: Use ib_device pointer safely xprtrdma: Remove rr_func xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt xprtrdma: Warn when there are orphaned IB objects ...
| | * xprtrdma: Remove rpcrdma_ia::ri_memreg_strategyChuck Lever2015-06-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: This field is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | SUNRPC: Transport fault injectionChuck Lever2015-06-101-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has been exceptionally useful to exercise the logic that handles local immediate errors and RDMA connection loss. To enable developers to test this regularly and repeatably, add logic to simulate connection loss every so often. Fault injection is disabled by default. It is enabled with $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect where "xxx" is a large positive number of transport method calls before a disconnect. A value of several thousand is usually a good number that allows reasonable forward progress while still causing a lot of connection drops. These hooks are disabled when SUNRPC_DEBUG is turned off. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | sunrpc: turn swapper_enable/disable functions into rpc_xprt_opsJeff Layton2015-06-101-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the xs_swapper_enable/disable functions will likely oops when fed an RDMA xprt. Turn these functions into rpc_xprt_ops so that that doesn't occur. For now the RDMA versions are no-ops that just return -EINVAL on an attempt to swapon. Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | sunrpc: make xprt->swapper an atomic_tJeff Layton2015-06-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split xs_swapper into enable/disable functions and eliminate the "enable" flag. Currently, it's racy if you have multiple swapon/swapoff operations running in parallel over the same xprt. Also fix it so that we only set it to a memalloc socket on a 0->1 transition and only clear it on a 1->0 transition. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | sunrpc: keep a count of swapfiles associated with the rpc_clntJeff Layton2015-06-102-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Fix a backchannel raceTrond Myklebust2015-06-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to allow the server to send a new request immediately after we've replied to the previous one. Right now, there is a window between the send and the release of the old request in rpc_put_task(), where the server could send us a new backchannel RPC call, and we have no request to service it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Remove unused argument 'tk_ops' in rpc_run_bc_taskTrond Myklebust2015-06-051-2/+1
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: Clean up bc_send()Chuck Lever2015-06-021-1/+0
| |/ | | | | | | | | | | | | | | | | Clean up: Merge bc_send() into bc_svc_process(). Note: even thought this touches svc.c, it is a client-side change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | svcrdma: Add a separate "max data segs macro for svcrdmaChuck Lever2015-06-041-0/+7
| | | | | | | | | | | | | | | | The server and client maximum are architecturally independent. Allow changing one without affecting the other. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAILChuck Lever2015-06-041-1/+0
| | | | | | | | | | | | | | | | | | At the 2015 LSF/MM, it was requested that memory allocation call sites that request GFP_KERNEL allocations in a loop should be annotated with __GFP_NOFAIL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Keep rpcrdma_msg fields in network byte-orderChuck Lever2015-06-041-1/+1
| | | | | | | | | | | | | | | | | | Fields in struct rpcrdma_msg are __be32. Don't byte-swap these fields when decoding RPC calls and then swap them back for the reply. For the most part, they can be left alone. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | svcrdma: Remove svc_rdma_xdr_decode_deferred_req()Chuck Lever2015-06-031-1/+0
|/ | | | | | | | | svc_rdma_xdr_decode_deferred_req() indexes an array with an un-byte-swapped value off the wire. Fortunately this function isn't used anywhere, so simply remove it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2015-04-232-6/+7
|\ | | | | | | | | | | | | | | | | | | NFS: NFSoRDMA Client Changes This patch series creates an operation vector for each of the different memory registration modes. This should make it easier to one day increase credit limit, rsize, and wsize. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: Introduce missing well-known netidsChuck Lever2015-03-312-6/+7
| | | | | | | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | sunrpc: make debugfs file creation failure non-fatalJeff Layton2015-04-231-9/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: gracefully handle the case where some dentry pointers end up NULL and be more dilligent about zeroing out dentry pointers We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Merge branch 'cleanups'Trond Myklebust2015-02-182-3/+12
|\ | | | | | | | | | | | | | | | | Merge cleanups requested by Linus. * cleanups: (3 commits) pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit nfs: Can call nfs_clear_page_commit() instead nfs: Provide and use helper functions for marking a page as unstable
| * Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2015-02-122-3/+12
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "The main change is the pNFS block server support from Christoph, which allows an NFS client connected to shared disk to do block IO to the shared disk in place of NFS reads and writes. This also requires xfs patches, which should arrive soon through the xfs tree, barring unexpected problems. Support for other filesystems is also possible if there's interest. Thanks also to Chuck Lever for continuing work to get NFS/RDMA into shape" * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: default NFSv4.2 to on nfsd: pNFS block layout driver exportfs: add methods for block layout exports nfsd: add trace events nfsd: update documentation for pNFS support nfsd: implement pNFS layout recalls nfsd: implement pNFS operations nfsd: make find_any_file available outside nfs4state.c nfsd: make find/get/put file available outside nfs4state.c nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c nfsd: add fh_fsid_match helper nfsd: move nfsd_fh_match to nfsfh.h fs: add FL_LAYOUT lease type fs: track fl_owner for leases nfs: add LAYOUT_TYPE_MAX enum value nfsd: factor out a helper to decode nfstime4 values sunrpc/lockd: fix references to the BKL nfsd: fix year-2038 nfs4 state problem svcrdma: Handle additional inline content svcrdma: Move read list XDR round-up logic ...
OpenPOWER on IntegriCloud