op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw	Linus Torvalds	2011-10-28	19	-1012/+666
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits) GFS2: Move readahead of metadata during deallocation into its own function GFS2: Remove two unused variables GFS2: Misc fixes GFS2: rewrite fallocate code to write blocks directly GFS2: speed up delete/unlink performance for large files GFS2: Fix off-by-one in gfs2_blk2rgrpd GFS2: Clean up ->page_mkwrite GFS2: Correctly set goal block after allocation GFS2: Fix AIL flush issue during fsync GFS2: Use cached rgrp in gfs2_rlist_add() GFS2: Call do_strip() directly from recursive_scan() GFS2: Remove obsolete assert GFS2: Cache the most recently used resource group in the inode GFS2: Make resource groups "append only" during life of fs GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added GFS2: Clean up gfs2_create GFS2: Use ->dirty_inode() GFS2: Fix bug trap and journaled data fsync GFS2: Fix inode allocation error path ...
\| *	GFS2: Move readahead of metadata during deallocation into its own function	Steven Whitehouse	2011-10-21	1	-19/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the recently added readahead of the indirect pointer tree during deallocation into its own function in order that we can use it elsewhere in the future. Also this fixes the resetting of the "first" variable in the original patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Remove two unused variables	Steven Whitehouse	2011-10-21	3	-20/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The two variables being initialised in gfs2_inplace_reserve to track the file & line number of the caller are never used, so we might as well remove them. If something does go wrong, then a stack trace is probably more useful anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Misc fixes	Steven Whitehouse	2011-10-21	3	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some items picked up through automated code analysis. A few bits of unreachable code and two unchecked return values. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: rewrite fallocate code to write blocks directly	Benjamin Marzinski	2011-10-21	3	-147/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GFS2's fallocate code currently goes through the page cache. Since it's only writing to the end of the file or to holes in it, it doesn't need to, and it was causing issues on low memory environments. This patch pulls in some of Steve's block allocation work, and uses it to simply allocate the blocks for the file, and zero them out at allocation time. It provides a slight performance increase, and it dramatically simplifies the code. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: speed up delete/unlink performance for large files	Bob Peterson	2011-10-21	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the performance of delete/unlink operations in a GFS2 file system where the files are large by adding a layer of metadata read-ahead for indirect blocks. Mileage will vary, but on my system, deleting an 8.6G file dropped from 22 seconds to about 4.5 seconds. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Fix off-by-one in gfs2_blk2rgrpd	Steven Whitehouse	2011-10-21	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bob reported: I found an off-by-one problem with how I coded this section: It should be: + else if (blk >= cur->rd_data0 + cur->rd_data) In fact, cur->rd_data0 + cur->rd_data is the start of the next rgrp (the next ri_addr), so without the "=" check it can land on the wrong rgrp. In all normal cases, this won't be a problem: you're searching for a block _within_ the rgrp, which will pass the test properly. Where it gets into trouble is if you search the rgrps for the block exactly equal to ri_addr. I don't think anything in the kernel does this, but I found a place in gfs2-utils gfs2_edit where it does. So I definitely need to fix it in libgfs2. I'd like to suggest we fix it in the kernel as well for the sake of keeping the functions similar. So this patch fixes the above mentioned off by one error as well as removing the unused parent pointer. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Clean up ->page_mkwrite	Steven Whitehouse	2011-10-21	1	-18/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch brings gfs2's ->page_mkwrite uptodate with respect to the expectations set by the VM. Also added is a check to wait if the fs is frozen, before we attempt to get a glock. This will only work on the node which initiates the freeze, but thats ok since the transaction lock will still provide the expected barrier on other nodes. The major change here is that we return a locked page now, except when we don't return a page at all (error cases). This removes the race which required rechecking the page after it was returned. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Nick Piggin <npiggin@kernel.dk>
\| *	GFS2: Correctly set goal block after allocation	Steven Whitehouse	2011-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new goal block should be set to the end of the newly allocated extent, not the start of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Fix AIL flush issue during fsync	Steven Whitehouse	2011-10-21	4	-24/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, it is not enough to just ignore locked buffers during the AIL flush from fsync. We need to be able to ignore all buffers which are locked, dirty or pinned at this stage as they might have been added subsequent to the log flush earlier in the fsync function. In addition, this means that we no longer need to rely on i_mutex to keep out writes during fsync, so we can, as a side-effect, remove that protection too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-By: Abhijith Das <adas@redhat.com>
\| *	GFS2: Use cached rgrp in gfs2_rlist_add()	Steven Whitehouse	2011-10-21	5	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each block which is deallocated, requires a call to gfs2_rlist_add() and each of those calls was calling gfs2_blk2rgrpd() in order to figure out which rgrp the block belonged in. This can be speeded up by making use of the rgrp cached in the inode. We also reset this cached rgrp in case the block has changed rgrp. This should provide a big reduction in gfs2_blk2rgrpd() calls during deallocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Call do_strip() directly from recursive_scan()	Steven Whitehouse	2011-10-21	1	-78/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recursive_scan() function only ever takes a single "bc" argument, so we might as well just call do_strip() directly from resource_scan() rather than pass it in as an argument. Also the "data" argument is always a struct strip_mine, so we can pass that in, rather than using a void pointer. This also moves do_strip() ahead of recursive_scan() so that we don't need to add a prototype. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Remove obsolete assert	Steven Whitehouse	2011-10-21	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Given that a resource group has been locked, there is no reason why we should not be able to allocate as many blocks as are free. The al_requested parameter should really be considered as a minimum number of blocks to be available. Should this limit be overshot, there are other mechanisms which will prevent over allocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Cache the most recently used resource group in the inode	Steven Whitehouse	2011-10-21	9	-45/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This means that after the initial allocation for any inode, the last used resource group is cached in the inode for future use. This drastically reduces the number of lookups of resource groups in the common case, and this the contention on that data structure. The allocation algorithm is the same as previously, except that we always check to see if the goal block is within the cached rgrp first before going to the rbtree to look one up. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Make resource groups "append only" during life of fs	Steven Whitehouse	2011-10-21	9	-174/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme	Bob Peterson	2011-10-21	8	-345/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
\| *	GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added	Steven Whitehouse	2011-10-21	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to take the inode's glock whenever the inode's size is referenced, otherwise it might not be uptodate. Even though generic_file_llseek_unlocked() doesn't implement SEEK_DATA, SEEK_HOLE directly, it does reference the inode's size in those cases, so we need to add them to the list of origins which need the glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andi Kleen <ak@linux.intel.com>
\| *	GFS2: Clean up gfs2_create	Steven Whitehouse	2011-10-21	1	-22/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we pass through knowledge of whether the creation is intended to be exclusive or not, then we can deal with that in gfs2_create_inode and remove one set of locking. Also this removes the loop in gfs2_create and simplifies the code a bit. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Use ->dirty_inode()	Steven Whitehouse	2011-10-21	9	-97/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The aim of this patch is to use the newly enhanced ->dirty_inode() super block operation to deal with atime updates, rather than piggy backing that code into ->write_inode() as is currently done. The net result is a simplification of the code in various places and a reduction of the number of gfs2_dinode_out() calls since this is now implied by ->dirty_inode(). Some of the mark_inode_dirty() calls have been moved under glocks in order to take advantage of then being able to avoid locking in ->dirty_inode() when we already have suitable locks. One consequence is that generic_write_end() now correctly deals with file size updates, so that we do not need a separate check for that afterwards. This also, indirectly, means that fdatasync should work correctly on GFS2 - the current code always syncs the metadata whether it needs to or not. Has survived testing with postmark (with and without atime) and also fsx. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Fix bug trap and journaled data fsync	Steven Whitehouse	2011-10-21	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Journaled data requires that a complete flush of all dirty data for the file is done, in order that the ail flush which comes after will succeed. Also the recently enhanced bug trap can trigger falsely in case an ail flush from fsync races with a page read. This updates the bug trap such that it will ignore buffers which are locked and only trigger on dirty and/or pinned buffers when the ail flush is run from fsync. The original bug trap is retained when ail flush is run from ->go_sync() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Fix inode allocation error path	Steven Whitehouse	2011-10-21	3	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have got far enough through the inode allocation code path that an inode has already been allocated, then we must call iput to dispose of it, if an error occurs during a later part of the process. This will always be the final iput since there will be no other references to the inode. Unlike when the inode has been unlinked, its block state will be GFS2_BLKST_INODE rather than GFS2_BLKST_UNLINKED so we need to skip the test in ->evict_inode() for this one case in order to ensure that it will be deallocated correctly. This patch adds a new flag in order to ensure that this will happen correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Make atime checks more efficient	Steven Whitehouse	2011-10-21	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do not need to start a transaction unless the atime check has proved positive. Also if we are going to flush the complete ail list anyway, we might as well skip the writeback for this specific inode's metadata, since that will be done as part of the ail writeback process in an order offering potentially more efficient I/O. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Fix bug-trap in ail flush code	Steven Whitehouse	2011-10-21	1	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The assert was being tested under the wrong lock, a legacy of the original code. Also, if it does trigger, the resulting information was not always a lot of help. This moves the patch under the correct lock and also prints out more useful information in tacking down the source of the problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Split data write & wait in fsync	Steven Whitehouse	2011-10-21	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the data writing is part of fsync proper, we can split the waiting part out and do it later on. This reduces the number of waits that we do during fsync on average. There is also no need to take the i_mutex unless we are flushing metadata to disk, so we can move that to within the metadata flushing code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| *	GFS2: Clean up dir hash table reading	Steven Whitehouse	2011-10-21	1	-23/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since there is now only a single caller to gfs2_dir_read_data() and it has a number of constant arguments, we can factor those out. Also some tests relating to the inode size were being done twice. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	Merge branch '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	2011-10-28	22	-1015/+2229
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6: (52 commits) Fix build break when freezer not configured Add definition for share encryption CIFS: Make cifs_push_locks send as many locks at once as possible CIFS: Send as many mandatory unlock ranges at once as possible CIFS: Implement caching mechanism for posix brlocks CIFS: Implement caching mechanism for mandatory brlocks CIFS: Fix DFS handling in cifs_get_file_info CIFS: Fix error handling in cifs_readv_complete [CIFS] Fixup trivial checkpatch warning [CIFS] Show nostrictsync and noperm mount options in /proc/mounts cifs, freezer: add wait_event_freezekillable and have cifs use it cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters cifs: tune bdi.ra_pages in accordance with the rsize cifs: allow for larger rsize= options and change defaults cifs: convert cifs_readpages to use async reads cifs: add cifs_async_readv cifs: fix protocol definition for READ_RSP cifs: add a callback function to receive the rest of the frame cifs: break out 3rd receive phase into separate function cifs: find mid earlier in receive codepath ...
\| * \|	Add definition for share encryption	Steve French	2011-10-27	1	-7/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Samba supports a setfs info level to negotiate encrypted shares. This patch adds the defines so we recognize this info level. Later patches will add the enablement for it. Acked-by: Jeremy Allison <jra@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Make cifs_push_locks send as many locks at once as possible	Pavel Shilovsky	2011-10-24	1	-6/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that reduces a traffic and increases a performance. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Send as many mandatory unlock ranges at once as possible	Pavel Shilovsky	2011-10-24	3	-36/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that reduces a traffic and increases a performance. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Implement caching mechanism for posix brlocks	Pavel Shilovsky	2011-10-24	3	-15/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to handle all lock requests on the client in an exclusive oplock case. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Implement caching mechanism for mandatory brlocks	Pavel Shilovsky	2011-10-24	2	-11/+197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an oplock and negotiate mandatory locking style we handle all brlock requests on the client. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Acked-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Fix DFS handling in cifs_get_file_info	Pavel Shilovsky	2011-10-22	1	-7/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should call cifs_all_info_to_fattr in rc == 0 case only. Cc: <stable@kernel.org> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	CIFS: Fix error handling in cifs_readv_complete	Pavel Shilovsky	2011-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In cifs_readv_receive we don't update rdata->result to error value after kmap'ing a page. We should kunmap the page in the no error case only. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
\| * \|	Merge branch 'cifs-3.2' of git://git.samba.org/jlayton/linux into temp-3.2-jeff	Steve French	2011-10-19	8	-366/+880
\| \|\ \
\| \| * \|	cifs, freezer: add wait_event_freezekillable and have cifs use it	Jeff Layton	2011-10-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CIFS currently uses wait_event_killable to put tasks to sleep while they await replies from the server. That function though does not allow the freezer to run. In many cases, the network interface may be going down anyway, in which case the reply will never come. The client then ends up blocking the computer from suspending. Fix this by adding a new wait_event_freezable variant -- wait_event_freezekillable. The idea is to combine the behavior of wait_event_killable and wait_event_freezable -- put the task to sleep and only allow it to be awoken by fatal signals, but also allow the freezer to do its job. Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters	Jeff Layton	2011-10-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: tune bdi.ra_pages in accordance with the rsize	Jeff Layton	2011-10-19	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tune bdi.ra_pages to be a multiple of the rsize. This prevents the VFS from asking for pages that require small reads to satisfy. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: allow for larger rsize= options and change defaults	Jeff Layton	2011-10-19	2	-50/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we cap the rsize at a value that fits in CIFSMaxBufSize. That's not needed any longer for readpages. Allow the use of larger values for readpages. cifs_iovec_read and cifs_read however are still limited to the CIFSMaxBufSize. Make sure they don't exceed that. The patch also changes the rsize defaults. The default when unix extensions are enabled is set to 1M for parity with the wsize, and there is a hard cap of ~16M. When unix extensions are not enabled, the default is set to 60k. According to MS-CIFS, Windows servers can only send a max of 60k at a time, so this is more efficient than requesting a larger size. If the user wishes however, the max can be extended up to 128k - the length of the READ_RSP header. Really old servers however require a special hack to ensure that we don't request too large a read. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: convert cifs_readpages to use async reads	Jeff Layton	2011-10-19	1	-168/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have code in place to do asynchronous reads, convert cifs_readpages to use it. The new cifs_readpages walks the page_list that gets passed in, locks and adds the pages to the pagecache and sets up cifs_readdata to handle the reads. The rest is handled by the cifs_async_readv infrastructure. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: add cifs_async_readv	Jeff Layton	2011-10-19	3	-13/+396
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...which will allow cifs to do an asynchronous read call to the server. The caller will allocate and set up cifs_readdata for each READ_AND_X call that should be issued on the wire. The pages passed in are added to the pagecache, but not placed on the LRU list yet (as we need the page->lru to keep the pages on the list in the readdata). When cifsd identifies the mid, it will see that there is a special receive handler for the call, and use that to receive the rest of the frame. cifs_readv_receive will then marshal up a kvec array with kmapped pages from the pagecache, which eliminates one copy of the data. Once the data is received, the pages are added to the LRU list, set uptodate, and unlocked. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: fix protocol definition for READ_RSP	Jeff Layton	2011-10-19	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no pad, and it simplifies the code to remove the "Data" field. None of the existing code relies on these fields, or on the READ_RSP being a particular length. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: add a callback function to receive the rest of the frame	Jeff Layton	2011-10-19	5	-10/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to handle larger SMBs for readpages and other calls, we want to be able to read into a preallocated set of buffers. Rather than changing all of the existing code to preallocate buffers however, we instead add a receive callback function to the MID. cifsd will call this function once the mid_q_entry has been identified in order to receive the rest of the SMB. If the mid can't be identified or the receive pointer is unset, then the standard 3rd phase receive function will be called. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: break out 3rd receive phase into separate function	Jeff Layton	2011-10-19	1	-42/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the entire 3rd phase of the receive codepath into a separate function in preparation for the addition of a pluggable receive function. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: find mid earlier in receive codepath	Jeff Layton	2011-10-19	1	-15/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to receive directly into a preallocated buffer, we need to ID the mid earlier, before the bulk of the response is read. Call the mid finding routine as soon as we're able to read the mid. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: move buffer pointers into TCP_Server_Info	Jeff Layton	2011-10-19	2	-55/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have several functions that need to access these pointers. Currently that's done with a lot of double pointer passing. Instead, move them into the TCP_Server_Info and simplify the handling. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: eliminate is_multi_rsp parm to find_cifs_mid	Jeff Layton	2011-10-19	1	-20/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change find_cifs_mid to only return NULL if a mid could not be found. If we got part of a multi-part T2 response, then coalesce it and still return the mid. The caller can determine the T2 receive status from the flags in the mid. With this change, there is no need to pass a pointer to "length" as well so just pass by value. If a mid is found, then we can just mark it as malformed. If one isn't found, then the value of "length" won't change anyway. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: move mid finding into separate routine	Jeff Layton	2011-10-19	1	-47/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Begin breaking up find_cifs_mid into smaller pieces. The parts that coalesce T2 responses don't really need to be done under the GlobalMid_lock anyway. Create a new function that just finds the mid on the list, and then later takes it off the list if the entire response has been received. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: add a third receive phase to cifs_demultiplex_thread	Jeff Layton	2011-10-19	1	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Have the demultiplex thread receive just enough to get to the MID, and then find it before receiving the rest. Later, we'll use this to swap in a preallocated receive buffer for some calls. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: keep a reusable kvec array for receives	Jeff Layton	2011-10-19	2	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Having to continually allocate a new kvec array is expensive. Allocate one that's big enough, and only reallocate it as needed. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>
\| \| * \|	cifs: turn read_from_socket into a wrapper around a vectorized version	Jeff Layton	2011-10-19	1	-7/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eventually we'll want to allow cifsd to read data directly into the pagecache. In order to do that we'll need a routine that can take a kvec array and pass that directly to kernel_recvmsg. Unfortunately though, the kernel's recvmsg routines modify the kvec array that gets passed in, so we need to use a copy of the kvec array and refresh that copy on each pass through the loop. Reviewed-and-Tested-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com>