summaryrefslogtreecommitdiffstats
path: root/fs/dlm
Commit message (Collapse)AuthorAgeFilesLines
* dlm: NULL dereference on failure in kmem_cache_create()Dan Carpenter2012-05-151-5/+3
| | | | | | | | We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if both allocations fail, it leads to a NULL dereference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: fixes for nodir modeDavid Teigland2012-05-029-162/+303
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "nodir" mode (statically assign master nodes instead of using the resource directory) has always been highly experimental, and never seriously used. This commit fixes a number of problems, making nodir much more usable. - Major change to recovery: recover all locks and restart all in-progress operations after recovery. In some cases it's not possible to know which in-progess locks to recover, so recover all. (Most require recovery in nodir mode anyway since rehashing changes most master nodes.) - Change the way nodir mode is enabled, from a command line mount arg passed through gfs2, into a sysfs file managed by dlm_controld, consistent with the other config settings. - Allow recovering MSTCPY locks on an rsb that has not yet been turned into a master copy. - Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages from a previous, aborted recovery cycle. Base this on the local recovery status not being in the state where any nodes should be sending LOCK messages for the current recovery cycle. - Hold rsb lock around dlm_purge_mstcpy_locks() because it may run concurrently with dlm_recover_master_copy(). - Maintain highbast on process-copy lkb's (in addition to the master as is usual), because the lkb can switch back and forth between being a master and being a process copy as the master node changes in recovery. - When recovering MSTCPY locks, flag rsb's that have non-empty convert or waiting queues for granting at the end of recovery. (Rename flag from LOCKS_PURGED to RECOVER_GRANT and similar for the recovery function, because it's not only resources with purged locks that need grant a grant attempt.) - Replace a couple of unnecessary assertion panics with error messages. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: improve error and debug messagesDavid Teigland2012-04-264-90/+164
| | | | | | | | | Change some existing error/debug messages to collect more useful information, and add some new error/debug messages to address recently found problems. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: avoid unnecessary search in search_rsbDavid Teigland2012-04-261-0/+3
| | | | | | | | | | If the rsb is found in the "keep" tree, but is not the right type (i.e. not MASTER), we can return immediately with the result. There's no point in going on to search the "toss" list as if we hadn't found it. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: limit rcom debug messagesDavid Teigland2012-04-262-28/+28
| | | | | | | | | Unify the checking for both types of ignored rcom messages, and replace the two log_debug statements with a single, rate limited debug message. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: fix waiter recoveryDavid Teigland2012-04-261-12/+31
| | | | | | | | | | | | An outstanding remote operation (an lkb on the "waiter" list) could sometimes miss being resent during recovery. The decision was based on the lkb_nodeid field, which could have changed during an earlier aborted recovery, so it no longer represents the actual remote destination. The lkb_wait_nodeid is always the actual remote node, so it is the best value to use. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: prevent connections during shutdownDavid Teigland2012-04-261-8/+20
| | | | | | | | | | | | | | | During lowcomms shutdown, a new connection could possibly be created, and attempt to use a workqueue that's been destroyed. Similarly, during startup, a new connection could attempt to use a workqueue that's not been set up yet. Add a global variable to indicate when new connections are allowed. Based on patch by: Christine Caulfield <ccaulfie@redhat.com> Reported-by: dann frazier <dann.frazier@canonical.com> Reviewed-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: David Teigland <teigland@redhat.com>
* Merge tag 'dlm-fixes-3.4' of ↵Linus Torvalds2012-04-231-0/+12
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fixes from David Teigland: "This includes one short patch fixing the behavior of the QUECVT flag, which the gfs2 folks are waiting on." * tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix QUECVT when convert queue is empty
| * dlm: fix QUECVT when convert queue is emptyDavid Teigland2012-04-231-0/+12
| | | | | | | | | | | | | | | | The QUECVT flag should not prevent conversions from being granted immediately when the convert queue is empty. Signed-off-by: David Teigland <teigland@redhat.com>
* | simple_open: automatically convert to simple_open()Stephen Boyd2012-04-051-8/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many users of debugfs copy the implementation of default_open() when they want to support a custom read/write function op. This leads to a proliferation of the default_open() implementation across the entire tree. Now that the common implementation has been consolidated into libfs we can replace all the users of this function with simple_open(). This replacement was done with the following semantic patch: <smpl> @ open @ identifier open_f != simple_open; identifier i, f; @@ -int open_f(struct inode *i, struct file *f) -{ ( -if (i->i_private) -f->private_data = i->i_private; | -f->private_data = i->i_private; ) -return 0; -} @ has_open depends on open @ identifier fops; identifier open.open_f; @@ struct file_operations fops = { ... -.open = open_f, +.open = simple_open, ... }; </smpl> [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'dlm-3.4' of ↵Linus Torvalds2012-03-214-5/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates for 3.4 from David Teigland: "This set includes one trivial fix, and one simple recovery speed up. Directory recovery can use the standard hash table to find resources rather than always searching the linear recovery list." * tag 'dlm-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: last element of dlm_local_addr[] never used dlm: fix slow rsb search in dir recovery
| * dlm: last element of dlm_local_addr[] never usedDavid Teigland2012-03-211-1/+1
| | | | | | | | | | | | | | | | The last element of dlm_local_addr[DLM_MAX_ADDR_COUNT] was not used because the loop ended at COUNT - 1. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: fix slow rsb search in dir recoveryDavid Teigland2012-03-083-4/+24
| | | | | | | | | | | | | | | | | | The function used to find an rsb during directory recovery was searching the single linear list of rsb's. This wasted a lot of time compared to using the standard hash table to find the rsb. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: Do not allocate a fd for peeloffBenjamin Poirier2012-03-081-14/+8
|/ | | | | | | | | avoids allocating a fd that a) propagates to every kernel thread and usermodehelper b) is not properly released. References: http://article.gmane.org/gmane.linux.network.drbd/22529 Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-01-1014-263/+873
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add recovery callbacks dlm: add node slots and generation dlm: move recovery barrier calls dlm: convert rsb list to rb_tree
| * dlm: add recovery callbacksDavid Teigland2012-01-048-163/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | These new callbacks notify the dlm user about lock recovery. GFS2, and possibly others, need to be aware of when the dlm will be doing lock recovery for a failed lockspace member. In the past, this coordination has been done between dlm and file system daemons in userspace, which then direct their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: add node slots and generationDavid Teigland2012-01-047-29/+480
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Slot numbers are assigned to nodes when they join the lockspace. The slot number chosen is the minimum unused value starting at 1. Once a node is assigned a slot, that slot number will not change while the node remains a lockspace member. If the node leaves and rejoins it can be assigned a new slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: move recovery barrier callsDavid Teigland2012-01-044-27/+28
| | | | | | | | | | | | | | | | Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: convert rsb list to rb_treeBob Peterson2011-11-185-55/+113
| | | | | | | | | | | | | | | | | | | | Change the linked lists to rb_tree's in the rsb hash table to speed up searches. Slow rsb searches were having a large impact on gfs2 performance due to the large number of dlm locks gfs2 uses. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | net: remove ipv6_addr_copy()Alexey Dobriyan2011-11-221-1/+1
|/ | | | | | | C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2011-07-251-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-3.1' of git://linux-nfs.org/~bfields/linux: nfsd: don't break lease on CLAIM_DELEGATE_CUR locks: rename lock-manager ops nfsd4: update nfsv4.1 implementation notes nfsd: turn on reply cache for NFSv4 nfsd4: call nfsd4_release_compoundargs from pc_release nfsd41: Deny new lock before RECLAIM_COMPLETE done fs: locks: remove init_once nfsd41: check the size of request nfsd41: error out when client sets maxreq_sz or maxresp_sz too small nfsd4: fix file leak on open_downgrade nfsd4: remember to put RW access on stateid destruction NFSD: Added TEST_STATEID operation NFSD: added FREE_STATEID operation svcrpc: fix list-corrupting race on nfsd shutdown rpc: allow autoloading of gss mechanisms svcauth_unix.c: quiet sparse noise svcsock.c: include sunrpc.h to quiet sparse noise nfsd: Remove deprecated nfsctl system call and related code. NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
| * locks: rename lock-manager opsJ. Bruce Fields2011-07-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | Both the filesystem and the lock manager can associate operations with a lock. Confusingly, one of them (fl_release_private) actually has the same name in both operation structures. It would save some confusion to give the lock-manager ops different names. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | dlm: don't limit active work itemsDavid Teigland2011-07-191-1/+4
| | | | | | | | | | | | | | | | Allow multiple workqueue items (locks with callbacks) to be processed concurrently. There should be no reason not to take advantage of this workqueue feature. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: use workqueue for callbacksDavid Teigland2011-07-157-205/+172
| | | | | | | | | | | | | | | | | | Instead of creating our own kthread (dlm_astd) to deliver callbacks for all lockspaces, use a per-lockspace workqueue to deliver the callbacks. This eliminates complications and slowdowns from many lockspaces sharing the same thread. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: remove deadlock debug printDavid Teigland2011-07-141-3/+0
| | | | | | | | | | | | | | gfs2 recently began using this feature heavily, creating more debug output than we want to see. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: improve rsb searchesDavid Teigland2011-07-127-48/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | By pre-allocating rsb structs before searching the hash table, they can be inserted immediately. This avoids always having to repeat the search when adding the struct to hash list. This also adds space to the rsb struct for a max resource name, so an rsb allocation can be used by any request. The constant size also allows us to finally use a slab for the rsb structs. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: keep lkbs in idrDavid Teigland2011-07-115-125/+82
| | | | | | | | | | | | | | | | This is simpler and quicker than the hash table, and avoids needing to search the hash list for every new lkid to check if it's used. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: fix kmalloc argsDavid Teigland2011-07-111-1/+1
| | | | | | | | | | | | The gfp and size args were switched. Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: don't do pointless NULL check, use kzalloc and fix order of argumentsJesper Juhl2011-07-111-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small issues: 1) There's no need to test the return value of the allocation and do a memset if is succeedes. Just use kzalloc() to obtain zeroed memory. 2) Since kfree() handles NULL pointers gracefully, the test of 'warned' against NULL before the kfree() after the loop is completely pointless. Remove it. 3) The arguments to kmalloc() (now kzalloc()) were swapped. Thanks to Dr. David Alan Gilbert for pointing this out. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: dump address of unknown nodeMasatake YAMATO2011-07-061-4/+5
| | | | | | | | | | | | | | | | When the dlm fails to make a network connection to another node, include the address of the node in the error message. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: use vmalloc for hash tablesBryn M. Reeves2011-07-011-9/+9
| | | | | | | | | | | | | | | | Allocate dlm hash tables in the vmalloc area to allow a greater maximum size without restructuring of the hash table code. Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | dlm: show addresses in configfsMasatake YAMATO2011-06-301-2/+57
|/ | | | | | | | | Display all addresses the dlm is using for the local node from the configfs file config/dlm/<cluster>/comms/<comm>/addr_list Also make the addr file write only. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* Merge branch 'trivial' of ↵Linus Torvalds2011-05-261-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: gfs2: Drop __TIME__ usage isdn/diva: Drop __TIME__ usage atm: Drop __TIME__ usage dlm: Drop __TIME__ usage wan/pc300: Drop __TIME__ usage parport: Drop __TIME__ usage hdlcdrv: Drop __TIME__ usage baycom: Drop __TIME__ usage pmcraid: Drop __DATE__ usage edac: Drop __DATE__ usage rio: Drop __DATE__ usage scsi/wd33c93: Drop __TIME__ usage scsi/in2000: Drop __TIME__ usage aacraid: Drop __TIME__ usage media/cx231xx: Drop __TIME__ usage media/radio-maxiradio: Drop __TIME__ usage nozomi: Drop __TIME__ usage cyclades: Drop __TIME__ usage
| * dlm: Drop __TIME__ usageMichal Marek2011-05-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | The kernel already prints its build timestamp during boot, no need to repeat it in random drivers and produce different object files each time. Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Michal Marek <mmarek@suse.cz>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-05-248-49/+219
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: make plock operation killable dlm: remove shared message stub for recovery dlm: delayed reply message warning dlm: Remove superfluous call to recalc_sigpending()
| * | dlm: make plock operation killableDavid Teigland2011-05-231-4/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow processes blocked on plock requests to be interrupted when they are killed. This leaves the problem of cleaning up the lock state in userspace. This has three parts: 1. Add a flag to unlock operations sent to userspace indicating the file is being closed. Userspace will then look for and clear any waiting plock operations that were abandoned by an interrupted process. 2. Queue an unlock-close operation (like in 1) to clean up userspace from an interrupted plock request. This is needed because the vfs will not send a cleanup-unlock if it sees no locks on the file, which it won't if the interrupted operation was the only one. 3. Do not use replies from userspace for unlock-close operations because they are unnecessary (they are just cleaning up for the process which did not make an unlock call). This also simplifies the new unlock-close generated from point 2. Signed-off-by: David Teigland <teigland@redhat.com>
| * | dlm: remove shared message stub for recoveryDavid Teigland2011-04-052-33/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmalloc a stub message struct during recovery instead of sharing the struct in the lockspace. This leaves the lockspace stub_ms only for faking downconvert replies, where it is never modified and sharing is not a problem. Also improve the debug messages in the same recovery function. Signed-off-by: David Teigland <teigland@redhat.com>
| * | dlm: delayed reply message warningDavid Teigland2011-04-016-11/+108
| | | | | | | | | | | | | | | | | | | | | | | | Add an option (disabled by default) to print a warning message when a lock has been waiting a configurable amount of time for a reply message from another node. This is mainly for debugging. Signed-off-by: David Teigland <teigland@redhat.com>
| * | dlm: Remove superfluous call to recalc_sigpending()Matt Fleming2011-03-281-1/+0
| |/ | | | | | | | | | | | | | | recalc_sigpending() is called within sigprocmask(), so there is no need call it again after sigprocmask() has returned. Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | Fix common misspellingsLucas De Marchi2011-03-313-3/+3
|/ | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* dlm: use alloc_workqueue functionDavid Teigland2011-03-101-2/+4
| | | | | | Replaces deprecated create_singlethread_workqueue(). Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: increase default hash table sizesDavid Teigland2011-03-101-2/+2
| | | | | | | | | Make all three hash tables a consistent size of 1024 rather than 1024, 512, 256. All three tables, for resources, locks, and lock dir entries, will generally be filled to the same order of magnitude. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: record full callback stateDavid Teigland2011-03-108-222/+311
| | | | | | | | | | | | | Change how callbacks are recorded for locks. Previously, information about multiple callbacks was combined into a couple of variables that indicated what the end result should be. In some situations, we could not tell from this combined state what the exact sequence of callbacks were, and would end up either delivering the callbacks in the wrong order, or suppress redundant callbacks incorrectly. This new approach records all the data for each callback, leaving no uncertainty about what needs to be delivered. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: use single thread workqueuesDavid Teigland2011-02-111-4/+2
| | | | | | | | | | The recent commit to use cmwq for send and recv threads dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems, apparently due to multiple workqueue threads. Single threads make the problems go away, so return to that until we fully understand the concurrency issues with multiple threads. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: Make DLM depend on CONFIGFS_FSNicholas Bellinger2011-01-161-2/+1
| | | | | | | | | | | | | | | | This patch fixes the following kconfig error after changing CONFIGFS_FS -> select SYSFS: fs/sysfs/Kconfig:1:error: recursive dependency detected! fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by DLM fs/dlm/Kconfig:1: symbol DLM depends on SYSFS Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: James Bottomley <James.Bottomley@suse.de>
* dlm: sanitize work_start() in lowcomms.cNamhyung Kim2010-12-131-9/+6
| | | | | | | | | The create_workqueue() returns NULL if failed rather than ERR_PTR(). Fix error checking and remove unnecessary variable 'error'. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: reduce cond_resched during sendBob Peterson2010-11-121-1/+9
| | | | | | | | | Calling cond_resched() after every send can unnecessarily degrade performance. Go back to an old method of scheduling after 25 messages. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: use TCP_NODELAYDavid Teigland2010-11-121-0/+10
| | | | | | Nagling doesn't help and can sometimes hurt dlm comms. Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: Use cmwq for send and receive workqueuesSteven Whitehouse2010-11-121-2/+4
| | | | | | | | | | | | | | | | So far as I can tell, there is no reason to use a single-threaded send workqueue for dlm, since it may need to send to several sockets concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid any possible deadlocks, WQ_HIGHPRI since locking traffic is highly latency sensitive (and to avoid a priority inversion wrt GFS2's glock_workqueue) and WQ_FREEZABLE just in case someone needs to do that (even though with current cluster infrastructure, it doesn't make sense as the node will most likely land up ejected from the cluster) in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
* dlm: Handle application limited situations properly.David Miller2010-11-111-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the normal regime where an application uses non-blocking I/O writes on a socket, they will handle -EAGAIN and use poll() to wait for send space. They don't actually sleep on the socket I/O write. But kernel level RPC layers that do socket I/O operations directly and key off of -EAGAIN on the write() to "try again later" don't use poll(), they instead have their own sleeping mechanism and rely upon ->sk_write_space() to trigger the wakeup. So they do effectively sleep on the write(), but this mechanism alone does not let the socket layers know what's going on. Therefore they must emulate what would have happened, otherwise TCP cannot possibly see that the connection is application window size limited. Handle this, therefore, like SUNRPC by setting SOCK_NOSPACE and bumping the ->sk_write_count as needed when we hit the send buffer limits. This should make TCP send buffer size auto-tuning and the ->sk_write_space() callback invocations actually happen. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: David Teigland <teigland@redhat.com>
OpenPOWER on IntegriCloud