summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/trans.c
Commit message (Collapse)AuthorAgeFilesLines
* GFS2: Use ->writepages for ordered writesSteven Whitehouse2013-01-291-23/+18
| | | | | | | | | | | | | | | | | | | | Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge gfs2_attach_bufdata() into trans.cSteven Whitehouse2013-01-291-11/+26
| | | | | | | | | | | | | | The locking in gfs2_attach_bufdata() was type specific (data/meta) which made the function rather confusing. This patch moves the core of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata() and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta() As a result all of the locking related to adding data and metadata to the journal is now in these two functions. This should help to clarify what is going on, and give us some opportunities to simplify in some cases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Copy gfs2_trans_add_bh into new data/meta functionsSteven Whitehouse2013-01-291-10/+82
| | | | | | | | | | | | This patch copies the body of gfs2_trans_add_bh into the two newly added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can then move the .lo_add functions from lops.c into trans.c and call them directly. As a result of this, we no longer need to use the .lo_add functions at all, so that is removed from the log operations structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Split gfs2_trans_add_bh() into twoSteven Whitehouse2013-01-291-1/+11
| | | | | | | | | | | | | | There is little common content in gfs2_trans_add_bh() between the data and meta classes by the time that the functions which it calls are taken into account. The intent here is to split this into two separate functions. Stage one is to introduce gfs2_trans_add_data() and gfs2_trans_add_meta() and update the callers accordingly. Later patches will then pull in the content of gfs2_trans_add_bh() and its dependent functions in order to clean up the code in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge revoke adding functionsSteven Whitehouse2013-01-291-1/+9
| | | | | | | This moves the lo_add function for revokes into trans.c, removing a function call and making the code easier to read. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Test bufdata with buffer locked and gfs2_log_lock heldBenjamin Marzinski2012-11-071-0/+8
| | | | | | | | | | | | | | In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* gfs2: Convert to new freezing mechanismJan Kara2012-07-311-0/+4
| | | | | | | | | | | | | We update gfs2_page_mkwrite() to use new freeze protection and the transaction code to use freeze protection while the transaction is running. That is needed to stop iput() of unlinked file from modifying the filesystem. The rest is handled by the generic code. CC: cluster-devel@redhat.com CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* GFS2: eliminate log elements and simplifyBob Peterson2012-05-021-6/+6
| | | | | | | | | | | This patch eliminates the gfs2_log_element data structure and rolls its two components into the gfs2_bufdata. This makes the code easier to understand and makes it easier to migrate to a rbtree to keep the list sorted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove bd_list_trSteven Whitehouse2012-04-241-13/+19
| | | | | | | | | | | | | | | This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count schemeBob Peterson2011-10-211-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
* GFS2: Various gfs2_logd improvementsBenjamin Marzinski2010-05-051-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Move journal live test at transaction startSteven Whitehouse2009-05-131-6/+3
| | | | | | | | | There seems little point grabbing the transaction glock only to have to release it again if the journal isn't live. This moves the test earlier to avoid grabbing the lock when we don't need it in the first place. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix deadlock on journal flushSteven Whitehouse2009-03-241-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge lock_dlm module into GFS2Steven Whitehouse2009-03-241-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update gfs2_trans_add_unrevoke to accept extentsSteven Whitehouse2008-03-311-15/+10
| | | | | | | | | | | | | | | | By adding an extra argument to gfs2_trans_add_unrevoke we can now specify an extent length of blocks to unrevoke. This means that we only need to make one pass through the list for each extent rather than each block. Currently the only extent length which is used is 1, but that will change in the future. Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta since its the only difference between this and gfs2_alloc_data which is left. This will allow a future patch to merge these two functions into one (i.e. one call to allocate both data and metadata in a single extent in the future). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Don't add glocks to the journalSteven Whitehouse2008-01-251-5/+0
| | | | | | | | | | | The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Clean up gfs2_trans_add_revoke()Steven Whitehouse2007-10-101-5/+5
| | | | | | | | | | | | | | | | | The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Use slab operations for all gfs2_bufdata allocationsSteven Whitehouse2007-10-101-2/+2
| | | | | | | | | | | The old revoke structure was allocated using kalloc/kfree but there is a slab cache for gfs2_bufdata, so we should use that now that the structures have been converted. This is part two of the patch series to merge the revoke and gfs2_bufdata structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Replace revoke structure with bufdata structureSteven Whitehouse2007-10-101-10/+10
| | | | | | | | | | | | | | Both the revoke structure and the bufdata structure are quite similar. They are basically small tags which are put on lists. In addition to which the revoke structure is always allocated when there is a bufdata structure which is (or can be) freed. As such it should be possible to reduce the number of frees and allocations by using the same structure for both purposes. This patch is the first step along that path. It replaces existing uses of the revoke structure with the bufdata structure. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Export lm_interface to kernel headersFabio Massimo Di Nitto2006-09-191-1/+1
| | | | | | | | | | | | | | | lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Change all types to uX styleSteven Whitehouse2006-09-041-3/+3
| | | | | | | This makes all fixed size types have consistent names. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update copyright, tidy up incore.hSteven Whitehouse2006-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update copyright date to 2006Steven Whitehouse2006-05-181-1/+1
| | | | Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Remove semaphore.h from C filesSteven Whitehouse2006-05-181-1/+0
| | | | | | | We no longer use semaphores, everything has been converted to mutex or rwsem, so we don't need to include this header any more. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Remove GL_NEVER_RECURSE flagSteven Whitehouse2006-04-261-2/+1
| | | | | | | There is no point in keeping this flag since recursion is not now allowed for any glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update journal accounting code.Steven Whitehouse2006-04-111-10/+3
| | | | | | | | | | | | | | | | | | | | | A small update to the journaling code to change the way that the "extra" blocks are accounted for in the journal. These are used at a rate of one per 503 metadata blocks or one per 251 journaled data blocks (or just one if the total number of journaled blocks in the transaction is smaller). Since we are using them at two different rates the old method of accounting for them no longer works and we count them up as required. Since the "per transaction" accounting can't handle this (there is no fixed number of header blocks per transaction) we have to account for it in the general journal code. We now require that each transaction reserves more blocks than it actually needs to take account of the possible extra blocks. Also a final fix to dir.c to ensure that all ref counts are handled correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix a ref count bug and other clean upsSteven Whitehouse2006-04-071-1/+1
| | | | | | | | | | | | | | | | | | | This fixes a ref count bug that sometimes showed up a umount time (causing it to hang) but it otherwise mostly harmless. At the same time there are some clean ups including making the log operations structures const, moving a memory allocation so that its not done in the fast path of checking to see if there is an outstanding transaction related to a particular glock. Removes the sd_log_wrap varaible which was updated, but never actually used anywhere. Updates the gfs2 ioctl() to run without the kernel lock (which it never needed anyway). Removes the "invalidate inodes" loop from GFS2's put_super routine. This is done in kill super anyway so we don't need to do it here. The loop was also bogus in that if there are any inodes "stuck" at this point its a bug and we need to know about it rather than hide it by hanging forever. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Add missing {} in trans.cSteven Whitehouse2006-03-301-1/+2
| | | | | | | A conditional had missing {} around the two following statements. Now added. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update debugging codeSteven Whitehouse2006-03-291-18/+14
| | | | | | | | | | | | Update the debugging code in trans.c and at the same time improve the debugging code for gfs2_holders. The new code should be pretty fast during the normal case and provide just as much information in case of errors (or more). One small function from glock.c has moved to glock.h as a static inline so that its return address won't get in the way of the debugging. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Update locking in log.cSteven Whitehouse2006-03-291-2/+2
| | | | | | | | | | | Replace the lock_for_trans()/lock_for_flush() functions with an rwsem. In fact the sd_log_flush_lock becomes an rwsem (the write part of it) and is extended slightly to cover everything that the lock_for_flush() used to cover. The read part of the lock is instead of lock_for_trans(). This corrects the races in the original code and reduces the code size. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Fix some bugsSteven Whitehouse2006-03-011-7/+6
| | | | | | | | | Fix a bug I introduced earlier with a kfree() and usage of a structure in the wrong order. Also try and get the counts of the journaled data buffers "more correct". Still some work to do in this area though. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Remove uneeded memory allocationSteven Whitehouse2006-03-011-18/+12
| | | | | | | For every filesystem operation where we need a transaction, we now make one less memory allocation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Macros removal in gfs2.hSteven Whitehouse2006-02-271-7/+12
| | | | | | | | | | | | | | As suggested by Pekka Enberg <penberg@cs.helsinki.fi>. The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h The other macros are gone from gfs2.h as (although not requested by Pekka Enberg) are a number of included header file which are now included individually. The inode number comparison function is now an inline function. The DT2IF and IF2DT may be addressed in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Use mutices rather than semaphoresSteven Whitehouse2006-02-211-2/+2
| | | | | | | | As well as a number of minor bug fixes, this patch changes GFS to use mutices rather than semaphores. This results in better information in case there are any locking problems. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Make journaled data files identical to normal files on diskSteven Whitehouse2006-02-081-18/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a very large patch, with a few still to be resolved issues so you might want to check out the previous head of the tree since this is known to be unstable. Fixes for the various bugs will be forthcoming shortly. This patch removes the special data format which has been used up till now for journaled data files. Directories still retain the old format so that they will remain on disk compatible with earlier releases. As a result you can now do the following with journaled data files: 1) mmap them 2) export them over NFS 3) convert to/from normal files whenever you want to (the zero length restriction is gone) In addition the level at which GFS' locking is done has changed for all files (since they all now use the page cache) such that the locking is done at the page cache level rather than the level of the fs operations. This should mean that things like loopback mounts and other things which touch the page cache directly should now work. Current known issues: 1. There is a lock mode inversion problem related to the resource group hold function which needs to be resolved. 2. Any significant amount of I/O causes an oops with an offset of hex 320 (NULL pointer dereference) which appears to be related to a journaled data buffer appearing on a list where it shouldn't be. 3. Direct I/O writes are disabled for the time being (will reappear later) 4. There is probably a deadlock between the page lock and GFS' locks under certain combinations of mmap and fs operation I/O. 5. Issue relating to ref counting on internally used inodes causes a hang on umount (discovered before this patch, and not fixed by it) 6. One part of the directory metadata is different from GFS1 and will need to be resolved before next release. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Remove gfs2_databuf in favour of gfs2_bufdata structureSteven Whitehouse2006-01-181-9/+9
| | | | | | | | | | | | | Removing the gfs2_databuf structure and using gfs2_bufdata instead is a step towards allowing journaling of data without requiring the metadata header on each journaled block. The idea is to merge the code paths for ordered data with that of journaled data, with the log operations in lops.c tacking account of the different types of buffers as they are presented to it. Largely the code path for metadata will be similar too, but obviously through a different set of log operations. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] Make the new argument to gfs2_trans_add_bh() actually do somethingSteven Whitehouse2006-01-181-1/+1
| | | | | | | Passes the flag through to ensure that the correct log operations are invoked when the flag is set. Signed-off-by: Steven Whitehouse: <swhiteho@redhat.com>
* [GFS2] Add an additional argument to gfs2_trans_add_bh()Steven Whitehouse2006-01-181-1/+2
| | | | | | | | | This adds an extra argument to gfs2_trans_add_bh() to indicate whether the bh being added to the transaction is metadata or data. Its currently unused since all existing callers set it to 1 (metadata) but following patches will make use of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* [GFS2] The core of GFS2David Teigland2006-01-161-0/+214
This patch contains all the core files for GFS2. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
OpenPOWER on IntegriCloud