op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	GFS2: dlm based recovery coordination	David Teigland	2012-01-11	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new method of managing recovery is an alternative to the previous approach of using the userland gfs_controld. - use dlm slot numbers to assign journal id's - use dlm recovery callbacks to initiate journal recovery - use a dlm lock to determine the first node to mount fs - use a dlm lock to track journals that need recovery Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	treewide: use __printf not __attribute__((format(printf,...)))	Joe Perches	2011-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standardize the style for compiler based printf format verification. Standardized the location of __printf too. Done via script and a little typing. $ grep -rPl --include=.[ch] -w "__attribute__" \| \ grep -vP "^(tools\|scripts\|include/linux/compiler-gcc.h)" \| \ xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s$\s\(\sformat\s\(\sprintf\s,\s(.+)\s,\s(.+)\s$\s\)\s\)/__printf($1, $2)/g ; print; }' [akpm@linux-foundation.org: revert arch bits] Signed-off-by: Joe Perches <joe@perches.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	GFS2: Automatically adjust glock min hold time	Bob Peterson	2011-07-15	1	-0/+6
\| \| \| \| \| \| \| \|	This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Alter point of entry to glock lru list for glocks with an address_space	Steven Whitehouse	2011-04-20	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Fix glock deallocation race	Steven Whitehouse	2011-03-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Use RCU for glock hash table	Steven Whitehouse	2011-01-21	1	-21/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
*	GFS2: Remove duplicate #defines from glock.h	Steven Whitehouse	2010-11-30	1	-14/+0
\| \| \| \| \| \| \| \|	There are a number of duplicated #defines in glock.h plus one which is unused. This removes the extra definitions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Clean up of gdlm_lock function	Steven Whitehouse	2010-11-30	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \|	The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: fs/gfs2/glock.h: Add __attribute__((format(printf,2,3)) to gfs2_print_dbg	Joe Perches	2010-11-30	1	-0/+2
\| \| \| \| \| \| \| \| \|	Functions that use printf formatting, especially those that use %pV, should have their uses of printf format and arguments checked by the compiler. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: fixed typo	Andrea Gelmini	2010-10-18	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Metadata address space clean up	Steven Whitehouse	2010-03-01	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Extend umount wait coverage to full glock lifetime	Steven Whitehouse	2010-02-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Remove obsolete code in quota.c	Steven Whitehouse	2009-12-03	1	-9/+0
\| \| \| \| \| \| \|	There is no point in testing for GLF_DEMOTE here, we might as well always release the glock at that point. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: remove dcache entries for remote deleted inodes	Benjamin Marzinski	2009-07-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Merge lock_dlm module into GFS2	Steven Whitehouse	2009-03-24	1	-4/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	Revert "GFS2: Fix use-after-free bug on umount"	Steven Whitehouse	2009-01-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43. The original patch is causing problems in relation to order of operations at umount in relation to jdata files. I need to fix this a different way. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Fix use-after-free bug on umount	Steven Whitehouse	2009-01-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a use-after-free with the GFS2 super block during umount. This patch moves almost all of the umount code from ->put_super into ->kill_sb, the only bit that cannot be moved being the glock hash clearing which has to remain as ->put_super due to umount ordering requirements. As a result its now obvious that the kfree is the final operation, whereas before it was hidden in ->put_super. Also gfs2_jindex_free is then only referenced from a single file so thats moved and marked static too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Kill two daemons with one patch	Steven Whitehouse	2009-01-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: Fix "truncate in progress" hang	Steven Whitehouse	2009-01-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following on from the recent clean up of gfs2_quotad, this patch moves the processing of "truncate in progress" inodes from the glock workqueue into gfs2_quotad. This fixes a hang due to the "truncate in progress" processing requiring glocks in order to complete. It might seem odd to use gfs2_quotad for this particular item, but we have to use a pre-existing thread since creating a thread implies a GFP_KERNEL memory allocation which is not allowed from the glock workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd may deadlock if used for this operation. gfs2_scand and gfs2_glockd are both scheduled for removal at some (hopefully not too distant) future point. That leaves only gfs2_quotad whose workload is generally fairly light and is easily adapted for this extra task. Also, as a result of this change, it opens the way for a future patch to make the reading of the inode's information asynchronous with respect to the glock workqueue, which is another improvement that has been on the list for some time now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	GFS2: high time to take some time over atime	Steven Whitehouse	2008-09-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, we've used the same scheme as GFS1 for atime. This has failed since atime is a per vfsmnt flag, not a per fs flag and as such the "noatime" flag was not getting passed down to the filesystems. This patch removes all the "special casing" around atime updates and we simply use the VFS's atime code. The net result is that GFS2 will now support all the same atime related mount options of any other filesystem on a per-vfsmnt basis. We do lose the "lazy atime" updates, but we gain "relatime". We could add lazy atime to the VFS at a later date, if there is a requirement for that variant still - I suspect relatime will be enough. Also we lose about 100 lines of code after this patch has been applied, and I have a suspicion that it will speed things up a bit, even when atime is "on". So it seems like a nice clean up as well. From a user perspective, everything stays the same except the loss of the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very least, and to be honest I don't think anybody ever used it) and that a number of options which were ignored before now work correctly. Please let me know if you've got any comments. I'm pushing this out early so that you can all see what my plans are. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove remote lock dropping code	Steven Whitehouse	2008-06-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several reasons why this is undesirable: 1. It never happens during normal operation anyway 2. If it does happen it causes performance to be very, very poor 3. It isn't likely to solve the original problem (memory shortage on remote DLM node) it was supposed to solve 4. It uses a bunch of arbitrary constants which are unlikely to be correct for any particular situation and for which the tuning seems to be a black art. 5. In an N node cluster, only 1/N of the dropped locked will actually contribute to solving the problem on average. So all in all we are better off without it. This also makes merging the lock_dlm module into GFS2 a bit easier. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Clean up the glock core	Steven Whitehouse	2008-06-27	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a number of cleanups to the core of the GFS2 glock code. As a result a lot of code is removed. It looks like a really big change, but actually a large part of this patch is either removing or moving existing code. There are some new bits too though, such as the new run_queue() function which is considerably streamlined. Highlights of this patch include: o Fixes a cluster coherency bug during SH -> EX lock conversions o Removes the "glmutex" code in favour of a single bit lock o Removes the ->go_xmote_bh() for inodes since it was duplicating ->go_lock() o We now only use the ->lm_lock() function for both locks and unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED) o The fast path is considerably shortly, giving performance gains especially with lock_nolock o The glock_workqueue is now used for all the callbacks from the DLM which allows us to simplify the lock_dlm module (see following patch) o The way is now open to make further changes such as eliminating the two threads (gfs2_glockd and gfs2_scand) in favour of a more efficient scheme. This patch has undergone extensive testing with various test suites so it should be pretty stable by now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
*	[GFS2] Fix a page lock / glock deadlock	Steven Whitehouse	2008-03-31	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've previously been using a "try lock" in readpage on the basis that it would prevent deadlocks due to the inverted lock ordering (our normal lock ordering is glock first and then page lock). Unfortunately tests have shown that this isn't enough. If the glock has a demote request queued such that run_queue() in the glock code tries to do a demote when its called under readpage then it will try and write out all the dirty pages which requires locking them. This then deadlocks with the page locked by readpage. The solution is to always require two calls into readpage. The first unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the second does the actual readpage and unlocks the glock & page as required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] make gfs2_glock_hold() static	Adrian Bunk	2008-03-31	1	-1/+0
\| \| \| \| \| \| \|	gfs2_glock_hold() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	gfs2: make gfs2_holder.gh_owner_pid be a struct pid *	Pavel Emelyanov	2008-02-07	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The gl_owner_pid field is used to get the holder task by its pid and check whether the current is a holder, so make it in a proper manner, i.e. via the struct pid * manipulations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118!	Abhijith Das	2007-10-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new flag to the gfs2_holder structure GL_FLOCK. It is set on holders of glocks representing flocks. This flag is checked in add_to_queue() and a process is permitted to queue more than one holder onto a glock if it is set. This solves the issue of a process not being able to do multiple flocks on the same file. Through a single descriptor, a process can now promote and demote flocks. Through multiple descriptors a process can now queue multiple flocks on the same file. There's still the problem of a process deadlocking itself (because gfs2 blocking locks are not interruptible) by queueing incompatible deadlock. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Reduce number of gfs2_scand processes to one	Steven Whitehouse	2007-10-10	1	-2/+2
\| \| \| \| \| \| \| \| \|	We only need a single gfs2_scand process rather than the one per filesystem which we had previously. As a result the parameter determining the frequency of gfs2_scand runs becomes a module parameter rather than a mount parameter as it was before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix deallocation issues	Abhijith Das	2007-07-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were two issues during deallocation of unlinked inodes. The first was relating to the use of a "try" lock which in the case of the inode lock wasn't trying hard enough to deallocate in all circumstances (now changed to a normal glock) and in the case of the iopen lock didn't wait for the demotion of the shared lock before attempting to get the exclusive lock, and thereby sometimes (timing dependent) not completing the deallocation when it should have done. The second issue related to the lack of a way to invalidate dcache entries on remote nodes (now fixed by this patch) which meant that unlinks were taking a long time to return disk space to the fs. By adding some code to invalidate the dcache entries across the cluster for unlinked inodes, that is now fixed. This patch was written jointly by Abhijith Das and Steven Whitehouse. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	Detach sched.h from mm.h	Alexey Dobriyan	2007-05-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[GFS2] Red Hat bz 228540: owner references	Robert Peterson	2007-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In Testing the previously posted and accepted patch for https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540 I uncovered some gfs2 badness. It turns out that the current gfs2 code saves off a process pointer when glocks is taken in both the glock and glock holder structures. Those structures will persist in memory long after the process has ended; pointers to poisoned memory. This problem isn't caused by the 228540 fix; the new capability introduced by the fix just uncovered the problem. I wrote this patch that avoids saving process pointers and instead saves off the process pid. Rather than referencing the bad pointers, it now does process lookups. There is special code that makes the output nicer for printing holder information for processes that have ended. This patch also adds a stub for the new "sprint_symbol" function that exists in Andrew Morton's -mm patch set, but won't go into the base kernel until 2.6.22, since it adds functionality but doesn't fix a bug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix bz 224480 and cleanup glock demotion code	Steven Whitehouse	2007-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch prevents the printing of a warning message in cases where the fs is functioning normally by handing off responsibility for unlinked, but still open inodes, to another node for eventual deallocation. Also, there is now an improved system for ensuring that such requests to other nodes do not get lost. The callback on the iopen lock is only ever called when i_nlink == 0 and when a node is unable to deallocate it due to it still being in use on another node. When a node receives the callback therefore, it knows that i_nlink must be zero, so we mark it as such (in gfs2_drop_inode) in order that it will then attempt deallocation of the inode itself. As an additional benefit, queuing a demote request no longer requires a memory allocation. This simplifies the code for dealing with gfs2_holders as it removes one special case. There are two new fields in struct gfs2_glock. gl_demote_state is the state which the remote node has requested and gl_demote_time is the time when the request came in. Both fields are only valid when the GLF_DEMOTE flag is set in gl_flags. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540)	Robert Peterson	2007-05-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The attached patch resolves bz 228540. This adds the capability for gfs2 to dump gfs2 locks through the debugfs file system. This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from gfs2 because all the ioctls were stripped out. Please see the bugzilla for more history about the fix. This patch is also attached to the bugzilla record. The patch is against Steve Whitehouse's latest nmw git tree kernel (2.6.21-rc1) and has been tested on system trin-10. Signed-off-by: Robert Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Tidy up glops calls	Steven Whitehouse	2007-02-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch doesn't make any changes to the ordering of the various operations related to glocking, but it does tidy up the calls to the glops.c functions to make the structure more obvious. The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be made static within glock.c since they are called by every set of glock operations. The xmote_th and drop_th glock operations are then made conditional upon those two routines existing and called from the previously mentioned functions in glock.c respectively. Also it can be seen that the go_sync operation isn't needed since it can easily be replaced by calls to xmote_bh and drop_bh respectively. This results in no longer (confusingly) calling back into routines in glock.c from glops.c and also reducing the glock operations by one member. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove local exclusive glock mode	Steven Whitehouse	2007-02-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here is a patch for GFS2 to remove the local exclusive flag. In the places it was used, mutex's are always held earlier in the call path, so it appears redundant in the LM_ST_SHARED case. Also, the GFS2 holders were setting local exclusive in any case where the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock code where the flag was tested have been replaced with tests for the lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well as globally exclusive). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove the "greedy" function from glock.[ch]	Steven Whitehouse	2007-02-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "greedy" code was an attempt to retain glocks for a minimum length of time when they relate to mmap()ed files. The current implementation of this feature is not, however, ideal in that it required allocating memory in order to do this and its overly complicated. It also misses the mark by ignoring the other I/O operations which are just as likely to suffer from the same problem. So the plan is to remove this now and then add the functionality back as part of the glock state machine at a later date (and thus take into account all the possible users of this feature) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Clean up/speed up readdir	Steven Whitehouse	2007-02-05	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the extra filldir callback which gfs2 was using to enclose an attempt at readahead for inodes during readdir. The code was too complicated and also hurts performance badly in the case that the getdents64/readdir call isn't being followed by stat() and it wasn't even getting it right all the time when it was. As a result, on my test box an "ls" of a directory containing 250000 files fell from about 7mins (freshly mounted, so nothing cached) to between about 15 to 25 seconds. When the directory content was cached, the time taken fell from about 3mins to about 4 or 5 seconds. Interestingly in the cached case, running "ls -l" once reduced the time taken for subsequent runs of "ls" to about 6 secs even without this patch. Now it turns out that there was a special case of glocks being used for prefetching the metadata, but because of the timeouts for these locks (set to 10 secs) the metadata was being timed out before it was being used and this the prefetch code was constantly trying to prefetch the same data over and over. Calling "ls -l" meant that the inodes were brought into memory and once the inodes are cached, the glocks are not disposed of until the inodes are pushed out of the cache, thus extending the lifetime of the glocks, and thus bringing down the time for subsequent runs of "ls" considerably. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix journal flush problem	Steven Whitehouse	2006-11-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix page lock/glock deadlock	Steven Whitehouse	2006-11-30	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes a race between the glock and the page lock encountered during truncate in gfs2_readpage and gfs2_prepare_write. The gfs2_readpages function doesn't need the same fix since it only uses a try lock anyway, so it will fail back to gfs2_readpage in the case of a potential deadlock. This bug was spotted by Russell Cattelan. Cc: Russell Cattelan <cattelan@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove unused GL_DUMP flag	Steven Whitehouse	2006-11-30	1	-1/+0
\| \| \| \| \| \| \| \|	There is no way to set the GL_DUMP flag, and in any case the same thing can be done with systemtap if required for debugging, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove unused function from glock.c	Steven Whitehouse	2006-09-09	1	-2/+0
\| \| \| \| \| \| \|	The callback for iopen locks is unused, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Use void * instead of typedef for locking module interface	Steven Whitehouse	2006-09-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	As requested by Jan Engelhardt, this removes the typedefs in the locking module interface and replaces them with void *. Also since we are changing the interface, I've added a few consts as well. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove one typedef	Steven Whitehouse	2006-09-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This removes one of the typedefs from the locking interface. It is replaced by a forward declaration of the gfs2 superblock. The other two are not so easy to solve since in their case, they can refer to one of two possible structures. Cc: David Teigland <teigland@redhat.com> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Move glock hash table out of superblock	Steven Whitehouse	2006-09-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several reasons why we want to do this: - Firstly its large and thus we'll scale better with multiple GFS2 fs mounted at the same time - Secondly its easier to scale its size as required (thats a plan for later patches) - Thirdly, we can use kzalloc rather than vmalloc when allocating the superblock (its now only 4888 bytes) - Fourth its all part of my plan to eventually be able to use RCU with the glock hash. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Make headers compile on their own	Steven Whitehouse	2006-09-05	1	-0/+2
\| \| \| \| \| \| \| \| \|	As per Jan Engelhardt's comments, this should make all the headers compile on their own by including and/or declaring structures early. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Change all types to uX style	Steven Whitehouse	2006-09-04	1	-3/+3
\| \| \| \| \| \| \|	This makes all fixed size types have consistent names. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Tidy up locking code	Steven Whitehouse	2006-09-04	1	-3/+3
\| \| \| \| \| \| \| \|	As per Jan Engelhardt's second email, this removes some unused code, and fixes up indenting in various places. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Update copyright, tidy up incore.h	Steven Whitehouse	2006-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Remove unused code from glock layer	Steven Whitehouse	2006-08-30	1	-1/+0
\| \| \| \| \| \| \| \|	Remove the unused sync feature from glocks. This is currently done by calling the required functions to sync pages/blocks directly so this code isn't needed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Make glock operations const	Steven Whitehouse	2006-08-30	1	-3/+3
\| \| \| \| \| \| \|	For all the usual reasons of enforcing correctness and potentially reducing code size, this patch makes the glock operations const. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	[GFS2] Fix unlinked file handling	Steven Whitehouse	2006-06-14	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the way we have been dealing with unlinked, but still open files. It removes all limits (other than memory for inodes, as per every other filesystem) on numbers of these which we can support on GFS2. It also means that (like other fs) its the responsibility of the last process to close the file to deallocate the storage, rather than the person who did the unlinking. Note that with GFS2, those two events might take place on different nodes. Also there are a number of other changes: o We use the Linux inode subsystem as it was intended to be used, wrt allocating GFS2 inodes o The Linux inode cache is now the point which we use for local enforcement of only holding one copy of the inode in core at once (previous to this we used the glock layer). o We no longer use the unlinked "special" file. We just ignore it completely. This makes unlinking more efficient. o We now use the 4th block allocation state. The previously unused state is used to track unlinked but still open inodes. o gfs2_inoded is no longer needed o Several fields are now no longer needed (and removed) from the in core struct gfs2_inode o Several fields are no longer needed (and removed) from the in core superblock There are a number of future possible optimisations and clean ups which have been made possible by this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>