op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	GFS2: Resolve inode eviction and ail list interaction bug	Steven Whitehouse	2011-07-14	4	-8/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>
*	GFS2: Fix race during filesystem mount	Steven Whitehouse	2011-07-12	3	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a potential race during filesystem mounting which has recently been reported. It occurs when the userland gfs_controld is able to process requests fast enough that it tries to use the sysfs interface before the lock module is properly initialised. This is a pretty unusual case as normally the lock module initialisation is very quick compared with gfs_controld. This patch adds an interruptible completion which is used to ensure that userland will wait for the initialisation of the lock module to complete. There are other potential solutions to this problem, but this is the quickest at this stage and has been tested both with and without mount.gfs2 present in the system. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Booher <dbooher@adams.net>
*	GFS2: force a log flush when invalidating the rindex glock	Benjamin Marzinski	2011-07-12	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, there is nothing that forces the log to get flushed when a node drops its rindex glock so that another node can grow the filesystem. If the log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the following way. A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is dropped so the other node can grow the filesystem. When the node reacquires the rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being removed from the list by gfs2_log_flush(). This code simply forces a log flush when the rindex glock is invalidated, solving the problem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes	Linus Torvalds	2011-06-07	1	-2/+7
\|\ \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Processes waiting on inode glock that no processes are holding
\| *	GFS2: Processes waiting on inode glock that no processes are holding	Bob Peterson	2011-05-25	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a race in the GFS2 glock state machine that may result in lockups. The symptom is that all nodes but one will hang, waiting for a particular glock. All the holder records will have the "W" (Waiting) bit set. The other node will typically have the glock stuck in Exclusive mode (EX) with no holder records, but the dinode will be cached. In other words, an entry with "I:" will appear in the glock dump for that glock, but nothing else. The race has to do with the glock "Pending Demote" bit, which can be set, then immediately reset, thus losing the fact that another node needs the glock. The sequence of events is: 1. Something schedules the glock workqueue (e.g. glock request from fs) 2. The glock workqueue gets to the point between the test of the reply pending bit and the spin lock: if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) { finish_xmote(gl, gl->gl_reply); drop_ref = 1; } down_read(&gfs2_umount_flush_sem); <---- i.e. here spin_lock(&gl->gl_spin); 3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and (b) the demote request which sets GLF_PENDING_DEMOTE 4. The following test is executed: if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) && gl->gl_state != LM_ST_UNLOCKED && gl->gl_demote_state != LM_ST_EXCLUSIVE) { This resets the pending demote flag, and gl->gl_demote_state is not equal to exclusive, however because the reply from the dlm arrived after we checked for the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG. The patch closes the timing window by only transitioning the "Pending demote" bit to the "demote" flag once we know the other conditions (not unlocked and not exclusive) are met. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	Merge branch 'trivial' of ↵	Linus Torvalds	2011-05-26	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: gfs2: Drop __TIME__ usage isdn/diva: Drop __TIME__ usage atm: Drop __TIME__ usage dlm: Drop __TIME__ usage wan/pc300: Drop __TIME__ usage parport: Drop __TIME__ usage hdlcdrv: Drop __TIME__ usage baycom: Drop __TIME__ usage pmcraid: Drop __DATE__ usage edac: Drop __DATE__ usage rio: Drop __DATE__ usage scsi/wd33c93: Drop __TIME__ usage scsi/in2000: Drop __TIME__ usage aacraid: Drop __TIME__ usage media/cx231xx: Drop __TIME__ usage media/radio-maxiradio: Drop __TIME__ usage nozomi: Drop __TIME__ usage cyclades: Drop __TIME__ usage
\| * \|	gfs2: Drop __TIME__ usage	Michal Marek	2011-05-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel already prints its build timestamp during boot, no need to repeat it in random drivers and produce different object files each time. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Michal Marek <mmarek@suse.cz>
* \| \|	vmscan: change shrinker API by passing shrink_control struct	Ying Han	2011-05-25	3	-7/+14
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	GFS2: Wait properly when flushing the ail list	Steven Whitehouse	2011-05-21	1	-3/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ail flush code has always relied upon log flushing to prevent it from spinning needlessly. This fixes it to wait on the last I/O request submitted (we don't need to wait for all of it) instead of either spinning with io_schedule or sleeping. As a result cpu usage of gfs2_logd is much reduced with certain workloads. Reported-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	GFS2: Wipe directory hash table metadata when deallocating a directory	Steven Whitehouse	2011-05-21	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The deallocation code for directories in GFS2 is largely divided into two parts. The first part deallocates any directory leaf blocks and marks the directory as being a regular file when that is complete. The second stage was identical to deallocating regular files. Regular files have their data blocks in a different address space to directories, and thus what would have been normal data blocks in a regular file (the hash table in a GFS2 directory) were deallocated correctly. However, a reference to these blocks was left in the journal (assuming of course that some previous activity had resulted in those blocks being in the journal or ail list). This patch uses the i_depth as a test of whether the inode is an exhash directory (we cannot test the inode type as that has already been changed to a regular file at this stage in deallocation) The original issue was reported by Chris Hertel as an issue he encountered running bonnie++ Reported-by: Christopher R. Hertel <crh@samba.org> Cc: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw	Linus Torvalds	2011-05-20	25	-2014/+1847
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits) GFS2: Move all locking inside the inode creation function GFS2: Clean up symlink creation GFS2: Clean up mkdir GFS2: Use UUID field in generic superblock GFS2: Rename ops_inode.c to inode.c GFS2: Inode.c is empty now, remove it GFS2: Move final part of inode.c into super.c GFS2: Move most of the remaining inode.c into ops_inode.c GFS2: Move gfs2_refresh_inode() and friends into glops.c GFS2: Remove gfs2_dinode_print() function GFS2: When adding a new dir entry, inc link count if it is a subdir GFS2: Make gfs2_dir_del update link count when required GFS2: Don't use gfs2_change_nlink in link syscall GFS2: Don't use a try lock when promoting to a higher mode GFS2: Double check link count under glock GFS2: Improve bug trap code in ->releasepage() GFS2: Fix ail list traversal GFS2: make sure fallocate bytes is a multiple of blksize GFS2: Add an AIL writeback tracepoint GFS2: Make writeback more responsive to system conditions ...
\| * \|	GFS2: Move all locking inside the inode creation function	Steven Whitehouse	2011-05-13	1	-132/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that there are no longer any exceptions to the normal inode creation code path, we can move the parts of the locking code which were duplicated in mkdir/mknod/create/symlink into the inode create function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Clean up symlink creation	Steven Whitehouse	2011-05-13	2	-39/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the symlink specific parts of inode creation into the function where we initialise the rest of the dinode. As a result we have one less place where we need to look up the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Clean up mkdir	Steven Whitehouse	2011-05-13	1	-44/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the initialisation of the directory into the inode creation functions to avoid having to duplicate the lookup of the inode's buffer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Use UUID field in generic superblock	Steven Whitehouse	2011-05-10	3	-22/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VFS superblock structure now has a UUID field, so we can use that in preference to the UUID field in the GFS2 superblock now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Rename ops_inode.c to inode.c	Steven Whitehouse	2011-05-10	2	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the final part of the ops_inode.c/inode.c reordering. We are left with a single file called inode.c which now contains all the inode operations, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Inode.c is empty now, remove it	Steven Whitehouse	2011-05-10	2	-39/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Move final part of inode.c into super.c	Steven Whitehouse	2011-05-09	2	-36/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now inode.c is empty. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Move most of the remaining inode.c into ops_inode.c	Steven Whitehouse	2011-05-09	2	-711/+711
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is in preparation to remove inode.c and rename ops_inode.c to inode.c. Also most of the functions which were left in inode.c relate to the creation and lookup of inodes. I'm intending to work on consolidating some of that code, and its easier when its all in one place. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Move gfs2_refresh_inode() and friends into glops.c	Steven Whitehouse	2011-05-09	2	-117/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Eventually there will only be a single caller of this code, so lets move it where it can be made static at some future date. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Remove gfs2_dinode_print() function	Steven Whitehouse	2011-05-09	4	-28/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function was intended for debugging purposes, but it is not very useful. If we want to know what is on disk then all we need is a block number and gfs2_edit can give us much better information about what is there. Otherwise, if we are interested in what is stored in the in-core inode, it doesn't help us out there either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: When adding a new dir entry, inc link count if it is a subdir	Steven Whitehouse	2011-05-09	5	-60/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an increment of the link count when we add a new directory entry, if that entry is itself a directory. This means that we no longer need separate code to perform this operation. Now that both adding and removing directory entries automatically update the parent directory's link count if required, that makes the code shorter and simpler than before. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Make gfs2_dir_del update link count when required	Steven Whitehouse	2011-05-09	3	-157/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we remove an entry from a directory, we can save ourselves some trouble if we know the type of the entry in question, since if it is itself a directory, we can update the link count of the parent at the same time as removing the directory entry. In addition this patch also merges the rmdir and unlink code which was almost identical anyway. This eliminates the calls to remove the . and .. directory entries on each rmdir (not needed since the directory will be deallocated, anyway) which was the only thing preventing passing the dentry to gfs2_dir_del(). The passing of the dentry rather than just the name allows us to figure out the type of the entry which is being removed, and thus adjust the link count when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Don't use gfs2_change_nlink in link syscall	Steven Whitehouse	2011-05-09	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three users of gfs2_change_nlink which add to the link count. Two of these are about to be removed in later patches, so this means that there will no callers, when that happens allowing removal of that function, also in a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Don't use a try lock when promoting to a higher mode	Steven Whitehouse	2011-05-05	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we marked all locks being promoted to a higher mode with the try flag to avoid any potential deadlocks issues. The DLM is able to detect these and report them in way that GFS2 can deal with them correctly. So we can just request the required mode and wait for a response without needing to perform this check. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Double check link count under glock	Steven Whitehouse	2011-05-05	2	-8/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To avoid any possible races relating to the link count, we need to recheck it under the inode's glock in all cases where it matters. Also to ensure we never get any nasty surprises, this patch also ensures that once the link count has hit zero it can never be elevated by rereading in data from disk. The only place we cannot provide a proper solution is in rename in the case where we are removing a target inode and we discover that the target inode has been already unlinked on another node. The race window is very small, and we return EAGAIN in this case to indicate what has happened. The proper solution would be to move the lookup parts of rename from the vfs into library calls which the fs could call directly, but that is potentially a very big job and this fix should cover most cases for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Improve bug trap code in ->releasepage()	Steven Whitehouse	2011-05-03	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the buffer is dirty or pinned, then as well as printing a warning, we should also refuse to release the page in question. Currently this can occur if there is a race between mmap()ed writers and O_DIRECT on the same file. With the addition of ->launder_page() in the future, we should be able to close this gap. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Fix ail list traversal	Steven Whitehouse	2011-05-03	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the recent patches to update the AIL list code, I managed to forget that the ail list lock got dropped, even though I added a comment specifically to remind myself :( Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: make sure fallocate bytes is a multiple of blksize	Benjamin Marzinski	2011-05-03	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GFS2 fallocate code chooses a target size to for allocating chunks of space. Whenever it can't find any resource groups with enough space free, it halves its target. Since this target is in bytes, eventually it will no longer be a multiple of blksize. As long as there is more space available in the resource group than the target, this isn't a problem, since gfs2 will use the actual space available, which is always a multiple of blksize. However, when gfs couldn't fallocate a bigger chunk than the target, it was using the non-blksize aligned number. This caused a BUG in later code that required blksize aligned offsets. GFS2 now ensures that bytes is always a multiple of blksize Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Add an AIL writeback tracepoint	Steven Whitehouse	2011-04-20	2	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a tracepoint for monitoring writeback of the AIL. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Make writeback more responsive to system conditions	Steven Whitehouse	2011-04-20	8	-90/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Optimise glock lru and end of life inodes	Steven Whitehouse	2011-04-20	7	-89/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Improve tracing support (adds two flags)	Steven Whitehouse	2011-04-20	4	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for two new flags. One keeps track of whether the glock is on the LRU list or not. The other isn't really a flag as such, but an indication of whether the glock has an attached object or not. This indication is reported without any locking, which is ok since we do not dereference the object pointer but merely report whether it is NULL or not. Also, this fixes one place where a tracepoint was missing, which was at the point we remove deallocated blocks from the journal. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Clean up fsync()	Steven Whitehouse	2011-04-20	3	-40/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is designed to clean up GFS2's fsync implementation and ensure that it really does get everything on disk. Since ->write_inode() has been updated, we can call that via the vfs library function sync_inode_metadata() and the only remaining thing that has to be done is to ensure that we get any revoke records in the log after the inode has been written back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Remove unused macro	Steven Whitehouse	2011-04-20	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The buffer_in_io() macro has been unused for some time, so remove it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Alter point of entry to glock lru list for glocks with an address_space	Steven Whitehouse	2011-04-20	5	-27/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Use filemap_fdatawrite() to write back the AIL	Steven Whitehouse	2011-04-20	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to ensure that the mapping stats (and thus the bdi) are correctly updated, this patch changes the AIL writeback to use the filemap_datawrite function. This helps prevent stalls in balance_dirty_pages() due to large amounts of dirty metadata when there is little or no dirty data around. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Make ->write_inode() really write	Steven Whitehouse	2011-04-20	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The GFS2 ->write_inode function should be more aggressive at writing back to the filesystem. This adopts the XFS system of returning -EAGAIN when the writeback has not been completely done. Also, we now kick off in-place writeback when called with WB_SYNC_NONE, but we only wait for it and flush the log when WB_SYNC_ALL is requested. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: move function foreach_leaf to gfs2_dir_exhash_dealloc	Bob Peterson	2011-04-20	1	-81/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous patches made function gfs2_dir_exhash_dealloc do nothing but call function foreach_leaf. This patch simplifies the code by moving the entire function foreach_leaf into gfs2_dir_exhash_dealloc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: pass leaf_bh into leaf_dealloc	Bob Peterson	2011-04-20	1	-11/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Function foreach_leaf used to look up the leaf block address and get a buffer_head. Then it would call leaf_dealloc which did the same lookup. This patch combines the two operations by making foreach_leaf pass the leaf bh to leaf_dealloc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Combine transaction from gfs2_dir_exhash_dealloc	Bob Peterson	2011-04-20	1	-35/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the end of function gfs2_dir_exhash_dealloc, it was setting the dinode type to "file" to prevent directory corruption in case of a crash. It was doing so in its own journal transaction. This patch makes the change occur when the last call is make to leaf_dealloc, since it needs to rewrite the directory dinode at that time anyway. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: remove *leaf_call_t and simplify leaf_dealloc	Bob Peterson	2011-04-20	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since foreach_leaf is only called with leaf_dealloc as its only possible call function, we can simplify the code by making it call leaf_dealloc directly. This simplifies the code and eliminates the need for leaf_call_t, the generic call method. This is a first small step in simplifying the directory leaf deallocation code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
\| * \|	GFS2: Dump better debug info if a bitmap inconsistency is detected	Bob Peterson	2011-04-20	1	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On rare occasions we encounter gfs2 problems where an invalid bitmap state transition is attempted. For example, trying to "unlink" a free block. In these cases, there is really no useful information logged to debug the problem. This patch adds more debug details that should allow us to more closely examine the problem and possibly solve it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \| \|	add hlist_bl_lock/unlock helpers	Christoph Hellwig	2011-04-25	1	-4/+2
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	GFS2: filesystem hang caused by incorrect lock order	Bob Peterson	2011-04-18	6	-21/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a deadlock in GFS2 where two processes are trying to reclaim an unlinked dinode: One holds the inode glock and calls gfs2_lookup_by_inum trying to look up the inode, which it can't, due to I_FREEING. The other has set I_FREEING from vfs and is at the beginning of gfs2_delete_inode waiting for the glock, which is held by the first. The solution is to add a new non_block parameter to the gfs2_iget function that causes it to return -ENOENT if the inode is being freed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	GFS2: Don't try to deallocate unlinked inodes when mounted ro	Steven Whitehouse	2011-04-18	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a couple of missing tests to avoid read-only nodes from attempting to deallocate unlinked inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
* \|	GFS2: directly write blocks past i_size	Benjamin Marzinski	2011-04-18	1	-10/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GFS2 was relying on the writepage code to write out the zeroed data for fallocate. However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size. If it is, it will be ignored. To work around this, gfs2 now calls write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE is set, and it's writing past i_size. This version is just a cleanup of my last version Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	GFS2: write_end error path fails to unlock transaction lock	Bob Peterson	2011-04-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did an audit of gfs2's transaction glock for bugzilla bug 658619 and ran across this: In function gfs2_write_end, in the unlikely event that gfs2_meta_inode_buffer returns an error, the code may forget to unlock the transaction lock because the "failed" label appears after the call to function gfs2_trans_end. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* \|	Fix common misspellings	Lucas De Marchi	2011-03-31	3	-3/+3
\|/ \| \| \| \| \|	Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
*	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2011-03-24	4	-13/+9
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}