op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	ocfs2: Pack vote message and response structures	Sunil Mushran	2007-09-20	1	-2/+2
\| \| \| \| \| \| \| \| \|	The ocfs2_vote_msg and ocfs2_response_msg structs needed to be packed to ensure similar sizeofs in 32-bit and 64-bit arches. Without this, we had inadvertantly broken 32/64 bit cross mounts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Don't double set write parameters	Mark Fasheh	2007-09-20	1	-13/+3
\| \| \| \| \| \| \| \| \| \|	The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Fix pos/len passed to ocfs2_write_cluster	Mark Fasheh	2007-09-20	1	-1/+16
\| \| \| \| \| \| \| \|	This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Allow smaller allocations during large writes	Mark Fasheh	2007-09-20	5	-14/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ocfs2 write code loops through a page much like the block code, except that ocfs2 allocation units can be any size, including larger than page size. Typically it's equal to or larger than page size - most kernels run 4k pages, the minimum ocfs2 allocation (cluster) size. Some changes introduced during 2.6.23 changed the way writes to pages are handled, and inadvertantly broke support for > 4k page size. Instead of just writing one cluster at a time, we now handle the whole page in one pass. This means that multiple (small) seperate allocations might happen in the same pass. The allocation code howver typically optimizes by getting the maximum which was reserved. This triggered a BUG_ON in the extend code where it'd ask for a single bit (for one part of a > 4k page) and get back more than it asked for. Fix this by providing a variant of the high level allocation function which allows the caller to specify a maximum. The traditional function remains and just calls the new one with a maximum determined from the initial reservation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Fix calculation of i_blocks during truncate	Mark Fasheh	2007-09-11	2	-1/+1
\| \| \| \| \| \| \|	We were setting i_blocks too early - before truncating any allocation. Correct things to set i_blocks after the allocation change. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	[PATCH] ocfs2: Fix a wrong cluster calculation.	tao.ma@oracle.com	2007-09-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	[PATCH] ocfs2: fix mount option parsing	Tiger Yang	2007-09-11	1	-32/+37
\| \| \| \| \| \| \| \| \| \| \| \| \|	For some mount option types, ocfs2_parse_options() will try to access sb->s_fs_info to get at the ocfs2 private superblock. Unfortunately, that hasn't been allocated yet and will cause a kernel crash. Fix this by storing options in a struct which can then get pushed into the ocfs2_super once it's been allocated later. If we need more options which store to the ocfs2_super in the future, we can just fields to this struct. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: set non-default s_time_gran during mount	Mark Fasheh	2007-08-09	1	-0/+1
\| \| \| \| \| \| \|	We need to manually set this to '1' during mount, otherwise inode_setattr() will chop off the nanosecond portion of our timestamps. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Retry sendpage() if it returns EAGAIN	Sunil Mushran	2007-08-09	1	-8/+16
\| \| \| \| \| \| \| \|	Instead of treating EAGAIN, returned from sendpage(), as an error, this patch retries the operation. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Fix rename/extend race	Sunil Mushran	2007-08-09	1	-1/+15
\| \| \| \| \| \| \| \| \| \|	If one process is extending a file while another is renaming it, there exists a window when rename could flush the old inode's stale i_size to disk. This patch recognizes the fact that rename is only updating the old inode's ctime, so it ensures only that value is flushed to disk. Signed-off-by: Sunil Mushran <sunil.musran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	[2.6 patch] ocfs2_insert_extent(): remove dead code	Adrian Bunk	2007-08-09	1	-4/+0
\| \| \| \| \| \| \| \| \|	This patch removes some now dead code. Spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Fix max offset calculations	Mark Fasheh	2007-08-09	1	-28/+40
\| \| \| \| \| \| \| \|	ocfs2_max_file_offset() was over-estimating the largest file size for several cases. This wasn't really a problem before, but now that we support sparse files, it needs to be more accurate. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: check ia_size limits in setattr	Mark Fasheh	2007-08-09	1	-0/+5
\| \| \| \| \| \| \|	We have to manually check the requested truncate size as the check in vmtruncate() comes too late for Ocfs2. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Fix some casting errors related to file writes	Mark Fasheh	2007-08-09	2	-5/+5
\| \| \| \| \| \| \| \|	ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating destination start for memcpy. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: use s_maxbytes directly in ocfs2_change_file_space()	Mark Fasheh	2007-08-09	3	-4/+2
\| \| \| \| \| \| \| \|	There's no need to recalculate things via ocfs2_max_file_offset() as we've already done that to fill s_maxbytes, so use that instead. We can also un-export ocfs2_max_file_offset() then. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: Restrict inode changes in ocfs2_update_inode_atime()	Mark Fasheh	2007-08-09	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \|	ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes from the struct inode into the ocfs2 disk inode. The problem is, ocfs2_mark_inode_dirty() might change other fields, depending on what happened to the struct inode. Since we don't always have locking to serialize changes to other fields (like i_size, etc), just fix things up to only touch the atime field. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
*	ocfs2: bad kunmap_atomic()	Jens Axboe	2007-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \|	kunmap_atomic() takes the virtual address, not the mapped page as argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fix some conversion overflows	Nick Piggin	2007-07-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix page index to offset conversion overflows in buffer layer, ecryptfs, and ocfs2. It would be nice to convert the whole tree to page_offset, but for now just fix the bugs. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: Remove slab destructors from kmem_cache_create().	Paul Mundt	2007-07-20	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2007-07-19	1	-18/+59
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: ->fallocate() support
\| *	ocfs2: ->fallocate() support	Mark Fasheh	2007-07-19	1	-18/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing preallocation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* \|	mm: fault feedback #1	Nick Piggin	2007-07-19	1	-16/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change ->fault prototype. We now return an int, which contains VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte. FAULT_RET_ code tells the VM whether a page was found, whether it has been locked, and potentially other things. This is not quite the way he wanted it yet, but that's changed in the next patch (which requires changes to arch code). This means we no longer set VM_CAN_INVALIDATE in the vma in order to say that a page is locked which requires filemap_nopage to go away (because we can no longer remain backward compatible without that flag), but we were going to do that anyway. struct fault_data is renamed to struct vm_fault as Linus asked. address is now a void __user * that we should firmly encourage drivers not to use without really good reason. The page is now returned via a page pointer in the vm_fault struct. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	mm: merge populate and nopage into fault (fixes nonlinear)	Nick Piggin	2007-07-19	2	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	mm: fix fault vs invalidate race for linear mappings	Nick Piggin	2007-07-19	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	usermodehelper: Tidy up waiting	Jeremy Fitzhardinge	2007-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
*	Merge branch 'uninit-var' of ↵	Linus Torvalds	2007-07-17	1	-1/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6: arch/i386/* fs/* ipc/: mark variables with uninitialized_var() drivers/: mark variables with uninitialized_var()
\| *	arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()	Jeff Garzik	2007-07-17	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>
* \|	Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check	Satyam Sharma	2007-07-17	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	knfsd: exportfs: add exportfs.h header	Christoph Hellwig	2007-07-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	currently the export_operation structure and helpers related to it are in fs.h. fs.h is already far too large and there are very few places needing the export bits, so split them off into a separate header. [akpm@linux-foundation.org: fix cifs build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2007-07-16	31	-980/+4231
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits) [PATCH] ocfs2: zero_user_page conversion ocfs2: Support xfs style space reservation ioctls ocfs2: support for removing file regions ocfs2: update truncate handling of partial clusters ocfs2: btree support for removal of arbirtrary extents ocfs2: Support creation of unwritten extents ocfs2: support writing of unwritten extents ocfs2: small cleanup of ocfs2_write_begin_nolock() ocfs2: btree changes for unwritten extents ocfs2: abstract btree growing calls ocfs2: use all extent block suballocators ocfs2: plug truncate into cached dealloc routines ocfs2: simplify deallocation locking ocfs2: harden buffer check during mapping of page blocks ocfs2: shared writeable mmap ocfs2: factor out write aops into nolock variants ocfs2: rework ocfs2_buffered_write_cluster() ocfs2: take ip_alloc_sem during entire truncate ocfs2: Add "preferred slot" mount option [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c ...
\| *	[PATCH] ocfs2: zero_user_page conversion	Eric Sandeen	2007-07-10	1	-11/+2
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: Support xfs style space reservation ioctls	Mark Fasheh	2007-07-10	6	-16/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to allocate and deallocate regions to a file without zeroing data or changing i_size. Though renamed, the structure passed in from user is identical to struct xfs_flock64. The three fields that are actually used right now are l_whence, l_start and l_len. This should get ocfs2 immediate compatibility with userspace software using the pre-existing xfs ioctls. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: support for removing file regions	Mark Fasheh	2007-07-10	4	-12/+262
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: update truncate handling of partial clusters	Mark Fasheh	2007-07-10	3	-46/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The partial cluster zeroing code used during truncate usually assumes that the rightmost byte in the range to be zeroed lies on a cluster boundary. This makes sense for truncate, but punching holes might require zeroing on non-aligned rightmost boundaries. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: btree support for removal of arbirtrary extents	Mark Fasheh	2007-07-10	1	-0/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add code to the btree paths to support the removal of arbitrary regions within an existing extent. With proper higher level support this can be used to "punch holes" in a file. Truncate (a special case of hole punching) could also be converted to use these methods. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: Support creation of unwritten extents	Mark Fasheh	2007-07-10	7	-28/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: support writing of unwritten extents	Mark Fasheh	2007-07-10	3	-26/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: small cleanup of ocfs2_write_begin_nolock()	Mark Fasheh	2007-07-10	1	-32/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: btree changes for unwritten extents	Mark Fasheh	2007-07-10	6	-91/+1770
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Writes to a region marked as unwritten might result in a record split or merge. We can support splits by making minor changes to the existing insert code. Merges require left rotations which mostly re-use right rotation support functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: abstract btree growing calls	Mark Fasheh	2007-07-10	1	-45/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The top level calls and logic for growing a tree can easily be abstracted out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree(). This allows future code to easily grow btrees when needed. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: use all extent block suballocators	Mark Fasheh	2007-07-10	2	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have a method to deallocate blocks from them, each node should allocate extent blocks from their local suballocator file. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: plug truncate into cached dealloc routines	Mark Fasheh	2007-07-10	5	-94/+29
\| \| \| \| \| \| \| \|	Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: simplify deallocation locking	Mark Fasheh	2007-07-10	4	-21/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deallocation of suballocator blocks, most notably extent blocks, might involve multiple suballocator inodes. The locking for this can get extremely complicated, especially when the suballocator inodes to delete from aren't known until deep within an unrelated codepath. Implement a simple scheme for recording the blocks to be unlinked so that the actual deallocation can be done in a context which won't deadlock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: harden buffer check during mapping of page blocks	Mark Fasheh	2007-07-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: shared writeable mmap	Mark Fasheh	2007-07-10	4	-39/+200
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: factor out write aops into nolock variants	Mark Fasheh	2007-07-10	1	-40/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: rework ocfs2_buffered_write_cluster()	Mark Fasheh	2007-07-10	3	-438/+551
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: take ip_alloc_sem during entire truncate	Mark Fasheh	2007-07-10	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use of the alloc sem during truncate was too narrow - we want to protect the i_size change and page truncation against mmap now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	ocfs2: Add "preferred slot" mount option	Sunil Mushran	2007-07-10	3	-5/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocfs2 will attempt to assign the node the slot# provided in the mount option. Failure to assign the preferred slot is not an error. This small feature can be useful for automated testing. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
\| *	[KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in ↵	Shani Moideen	2007-07-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fs/ocfs2/dlm/dlmrecovery.c Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>