summaryrefslogtreecommitdiffstats
path: root/fs/gfs2
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-01-302-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
| * block: Abstract out bvec iteratorKent Overstreet2013-11-232-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
* | Merge branch 'for-linus' of ↵Linus Torvalds2014-01-284-214/+62
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...
| * | gfs2: use generic posix ACL infrastructureChristoph Hellwig2014-01-254-214/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | This contains some major refactoring for the create path so that inodes are created with the right mode to start with instead of fixing it up later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: make posix_acl_create more usefulChristoph Hellwig2014-01-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the current posix_acl_created to __posix_acl_create and add a fully featured helper to set up the ACLs on file creation that uses get_acl(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: make posix_acl_chmod more usefulChristoph Hellwig2014-01-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the current posix_acl_chmod to __posix_acl_chmod and add a fully featured ACL chmod helper that uses the ->set_acl inode operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | GFS2: revert "GFS2: d_splice_alias() can't return error"J. Bruce Fields2014-01-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias() can't return error unless it was given an IS_ERR(inode)". That was true of the implementation of d_splice_alias, but this is really a problem with d_splice_alias: at a minimum it should be able to return -ELOOP in the case where inserting the given dentry would cause a directory loop. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Small cleanupBob Peterson2014-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a small cleanup to function gfs2_rgrp_go_lock so that it uses rgd instead of its more complicated twin. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Don't use ENOBUFS when ENOMEM is the correct error codeSteven Whitehouse2014-01-168-31/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
* | | GFS2: Fix kbuild test robot reported warningSteven Whitehouse2014-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Well I don't get the same warning locally as the kbuild robot, but I guess this should fix the problem, anyway. Here is the warning: head: 2d9e72303d538024627fb1fe2cbde48aec12acc0 commit: ee2411a8db49a21bc55dc124e1b434ba194c8903 [19/20] GFS2: Clean up quota slot allocation config: make ARCH=powerpc allmodconfig All error/warnings: fs/gfs2/quota.c: In function 'gfs2_quota_init': >> fs/gfs2/quota.c:1246:3: error: implicit declaration of function '__vmalloc' [-Werror=implicit-function-declaration] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ >> fs/gfs2/quota.c:1246:24: warning: assignment makes pointer from integer without a cast [enabled by default] sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS, PAGE_KERNEL); ^ fs/gfs2/quota.c: In function 'gfs2_quota_cleanup': >> fs/gfs2/quota.c:1361:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(sdp->sd_quota_bitmap); Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Move quota bitmap operations under their own lockSteven Whitehouse2014-01-143-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gradually, the global qd_lock is being used for less and less. After this patch it will only be used for the per super block list whose purpose is to allow syncing of changes back to the master quota file from the local quota changes file. Fixing up that process to make it more efficient will be the subject of a later patch, however this patch removes another barrier to doing that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
* | | GFS2: Clean up quota slot allocationSteven Whitehouse2014-01-142-73/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quota slot allocation has historically used a vector of pages and a set of homegrown find/test/set/clear bit functions. Since the size of the bitmap is likely to be based on the default qc file size, thats a couple of pages at most. So we ought to be able to allocate that as a single chunk, with a vmalloc fallback, just in case of memory fragmentation. We are then able to use the kernel's own find/test/set/clear bit functions, rather than rolling our own. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
* | | GFS2: Only run logd and quota when mounted read/writeSteven Whitehouse2014-01-143-68/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While investigating a rather strange bit of code in the quota clean up function, I spotted that the reason for its existence was that when remounting read only, we were not stopping the quotad thread, and thus it was possible for it to still have a reference to some of the quotas in that case. This patch moves the logd and quota thread start and stop into the make_fs_rw/ro functions, so that we now stop those threads when mounted read only. This means that quotad will always be stopped before we call the quota clean up function, and we can thus dispose of the (rather hackish) code that waits for it to give up its reference on the quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
* | | GFS2: Use RCU/hlist_bl based hash for quotasSteven Whitehouse2014-01-144-48/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, GFS2 kept all the quotas for each super block in a single linked list. This is rather slow when there are large numbers of quotas. This patch introduces a hlist_bl based hash table, similar to the one used for glocks. The initial look up of the quota is now lockless in the case where it is already cached, although we still have to take the per quota spinlock in order to bump the ref count. Either way though, this is a big improvement on what was there before. The qd_lock and the per super block list is preserved, for the time being. However it is intended that since this is no longer used for its original role, it should be possible to shrink the number of items on that list in due course and remove the requirement to take qd_lock in qd_get. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | GFS2: No need to invalidate pages for a dio readSteven Whitehouse2014-01-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We recently fixed the writeback of pages prior to performing direct i/o, however the initial fix was perhaps a bit heavy handed. There is no need to invalidate pages if the direct i/o is only a read, since they will be identical to what has been flushed to disk anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Add initialization for address space in super blockSteven Whitehouse2014-01-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Spotted by Andy Price. This should fix the odd messages from lockdep caused by 70d4ee94b370c5ef54d0870600f16bd92d18013c Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Price <anprice@redhat.com>
* | | GFS2: Add hints to directory leaf blocksSteven Whitehouse2014-01-081-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds four new fields to directory leaf blocks. The intent is not to use them in the kernel itself, although perhaps we may be able to use them as hints at some later date, but instead to provide more information for debug/fsck use. One new field adds a pointer to the inode to which the leaf belongs. This can be useful if the pointer to the leaf block has become corrupt, as it will allow us to know which inode this block should be associated with. This field is set when the leaf is created and never changed over its lifetime. The second field is a "distance from the hash table" field. The meaning is as follows: 0 = An old leaf in which this value has not been set 1 = This leaf is pointed to directly from the hash table 2+ = This leaf is part of a chain, pointed to by another leaf block, the value gives the position in the chain. The third and fourth fields combine to give a time stamp of the most recent directory insertion or deletion from this leaf block. The time stamp is not updated when a new leaf block is chained from the current one. The code is currently written such that the timestamp on the dir inode will match that of the leaf block for the most recent insertion/deletion. For backwards compatibility, any of these new fields which is zero should be considered to be "unknown". Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: For exhash conversion, only one block is neededSteven Whitehouse2014-01-081-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For most cases, only a single new block is needed when we reach the point of converting from stuffed to exhash directory. The exception being when the file name is so long that it will not fit within the new leaf block. So this patch adds a simple test for that situation so that we do not need to request the full reservation size in this case. Potentially we could calculate more accurately the value to use in other cases too, but that is much more complicated to do and it is doubtful that the benefit would outweigh the extra cost in code complexity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Increase i_writecount during gfs2_setattr_chownBob Peterson2014-01-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch calls get_write_access in function gfs2_setattr_chown, which merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. It also ensures that a multi-block reservation exists when needed for quota change operations during the chown. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Remember directory insert pointSteven Whitehouse2014-01-063-16/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we look to see if there is enough space to add a dir entry without allocation, we have then been repeating the same search later when we do the actual insertion. This patch caches the details of the location in the gfs2_diradd structure, so that we do not have to repeat the search. This will provide a performance improvement which will be greater as the size of the directory increases. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Consolidate transaction blocks calculation for dir addSteven Whitehouse2014-01-061-12/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three cases where we need to calculate the number of blocks to reserve in a transaction involving linking an inode into a directory. The one in rename is a bit more complicated, but the basis of it is the same as for link and create. So it makes sense to move this calculation into a single function rather than repeating it three times. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Add directory addition info structureSteven Whitehouse2014-01-063-33/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intent is that this structure will hold the information required when adding entries to a directory (linking). To start with, it will contain only the number of blocks which are required to link the new entry into the directory. The current calculation returns either 0 or the maximim number of blocks that can ever be requested by such a transaction. The intent is that in a later patch, we can update the dir code to calculate this value more accurately. In addition further patches will also add further fields to the new structure to increase its utility. In addition this patch fixes a bug where the link used during inode creation was adding requesting too many blocks in some cases. This is harmless unless the fs is close to being full. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Use only a single address space for rgrpsSteven Whitehouse2014-01-035-8/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch, GFS2 had one address space for each rgrp, stored in the glock. This patch changes them to use a single address space in the super block. This therefore saves (sizeof(struct address_space) * nr_of_rgrps) bytes of memory and for large filesystems, that can be significant. It would be nice to be able to do something similar and merge the inode metadata address space into the same global address space. However, that is rather more complicated as the on-disk location doesn't have a 1:1 mapping with the inodes in general. So while it could be done, it will be a more complicated operation as it requires changing a lot more code paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Use range based functions for rgrp sync/invalidationSteven Whitehouse2014-01-033-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Remove test which is always trueSteven Whitehouse2014-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since gfs2_inplace_reserve() is always called with a valid alloc parms structure, there is no need to test for this within the function itself - and in any case, after we've all ready dereferenced it anyway. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Remove gfs2_quota_change_host structureSteven Whitehouse2014-01-031-25/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is only one place this is used, when reading in the quota changes at mount time. It is not really required and much simpler to just convert the fields from the on-disk structure as required. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Clean up releasepageSteven Whitehouse2014-01-031-13/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For historical reasons, we drop and retake the log lock in ->releasepage() however, since there is no reason why we cannot hold the log lock over the whole function, this allows some simplification. In particular, pinning a buffer is only ever done under the log lock, so it is possible here to remove the test for pinned buffers in the second loop, since it is impossible for that to happen (it is also tested in the first loop). As a result, two tests made later in the second loop become constants and can also be reduced to the only possible branch. So the net result is to remove various bits of unreachable code and make this more readable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Implement a "rgrp has no extents longer than X" schemeBob Peterson2014-01-033-6/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the preceding patch, we started accepting block reservations smaller than the ideal size, which requires a lot more parsing of the bitmaps. To reduce the amount of bitmap searching, this patch implements a scheme whereby each rgrp keeps track of the point at this multi-block reservations will fail. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: Drop inadequate rgrps from the reservation treeBob Peterson2014-01-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is just basically a resend of a patch I posted earlier. It didn't change from its original, except in diff offsets, etc: This patch fixes a bug in the GFS2 block allocation code. The problem starts if a process already has a multi-block reservation, but for some reason, another process disqualifies it from further allocations. For example, the other process might set on the GFS2_RDF_ERROR bit. The process holding the reservation jumps to label skip_rgrp, but that label comes after the code that removes the reservation from the tree. Therefore, the no longer usable reservation is not removed from the rgrp's reservations tree; it's lost. Eventually, the lost reservation causes the count of reserved blocks to get off, and eventually that causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger. This patch moves the call to after label skip_rgrp so that the disqualified reservation is properly removed from the tree, thus keeping the rgrp rd_reserved count sane. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | GFS2: If requested is too large, use the largest extent in the rgrpBob Peterson2014-01-031-15/+49
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is a second try at a patch I posted earlier, which also implements suggestions Steve made: Before this patch, GFS2 would keep searching through all the rgrps until it found one that had a chunk of free blocks big enough to satisfy the size hint, which is based on the file write size, regardless of whether the chunk was big enough to perform the write. However, when doing big writes there may not be a large enough chunk of free blocks in any rgrp, due to file system fragmentation. The largest chunk may be big enough to satisfy the write request, but it may not meet the ideal reservation size from the "size hint". The writes would slow to a crawl because every write would search every rgrp, then finally give up and default to a single-block write. In my case, performance would drop from 425MB/s to 18KB/s, or 24000 times slower. This patch basically makes it so that if we can't find a contiguous chunk of blocks big enough to satisfy the sizehint, we'll use the largest chunk of blocks we found that will still contain the write. It does so by keeping track of the largest run of blocks within the rgrp. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Fix unsafe dereference in dump_holder()Tetsuo Handa2014-01-021-0/+2
| | | | | | | | | | | | | | | | GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Wait for async DIO in glock state changesSteven Whitehouse2013-12-201-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to wait for any outstanding DIO to complete in a couple of situations. Firstly, in case we are changing out of deferred mode (in inode_go_sync) where GLF_DIRTY will not be set. That call could be prefixed with a test for gl_state == LM_ST_DEFERRED but it doesn't seem worth it bearing in mind that the test for outstanding DIO is very quick anyway, in the usual case that there is none. The second case is in inode_go_lock which will catch the cases where we have a cached EX lock, but where we grant deferred locks against it so that there is no glock state transistion. We only need to wait if the state is not deferred, since DIO is valid anyway in that state. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Fix incorrect invalidation for DIO/buffered I/OSteven Whitehouse2013-12-201-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed local deferred locks to be granted against a cached exclusive lock. That opened up a corner case which this patch now fixes. The solution to the problem is to check whether we have cached pages each time we do direct I/O and if so to unmap, flush and invalidate those pages. Since the glock state machine normally does that for us, mostly the code will be a no-op. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Fix slab memory leak in gfs2_bufdataBob Peterson2013-12-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a slab memory leak that sometimes can occur for files with a very short lifespan. The problem occurs when a dinode is deleted before it has gotten to the journal properly. In the leak scenario, the bd object is pinned for journal committment (queued to the metadata buffers queue: sd_log_le_buf) but is subsequently unpinned and dequeued before it finds its way to the ail or the revoke queue. In this rare circumstance, the bd object needs to be freed from slab memory, or it is forgotten. We have to be very careful how we do it, though, because multiple processes can call gfs2_remove_from_journal. In order to avoid double-frees, only the process that does the unpinning is allowed to free the bd. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Fix use-after-free race when calling gfs2_remove_from_ailBob Peterson2013-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd->bd_blkno = bh->b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd->bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test "bd->bd_blkno >= blkno" which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: don't hold s_umount over blkdev_putSteven Whitehouse2013-12-131-1/+11
|/ | | | | | | | | | | | | This is a GFS2 version of Tejun's patch: 4f331f01b9c43bf001d3ffee578a97a1e0633eac vfs: don't hold s_umount over close_bdev_exclusive() call In this case its blkdev_put itself that is the issue and this patch uses the same solution of dropping and retaking s_umount. Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix ref count bug relating to atomic_openSteven Whitehouse2013-11-211-1/+4
| | | | | | | | | In the case that atomic_open calls finish_no_open() with the dentry that was supplied to gfs2_atomic_open() an extra reference count is required. This patch fixes that issue preventing a bug trap triggering at umount time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: fix potential NULL pointer dereferenceMichal Nazarewicz2013-11-211-1/+2
| | | | | | | | | | | | | | Commit [e66cf1610: GFS2: Use lockref for glocks] replaced call: atomic_read(&gi->gl->gl_ref) == 0 with: __lockref_is_dead(&gl->gl_lockref) therefore changing how gl is accessed, from gi->gl to plan gl. However, gl can be a NULL pointer, and so gi->gl needs to be used instead (which is guaranteed not to be NULL because fo the while loop checking that condition). Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* gfs2: endianness misannotationsAl Viro2013-11-153-19/+16
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-11-131-8/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "All kinds of stuff this time around; some more notable parts: - RCU'd vfsmounts handling - new primitives for coredump handling - files_lock is gone - Bruce's delegations handling series - exportfs fixes plus misc stuff all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits) ecryptfs: ->f_op is never NULL locks: break delegations on any attribute modification locks: break delegations on link locks: break delegations on rename locks: helper functions for delegation breaking locks: break delegations on unlink namei: minor vfs_unlink cleanup locks: implement delegations locks: introduce new FL_DELEG lock flag vfs: take i_mutex on renamed file vfs: rename I_MUTEX_QUOTA now that it's not used for quotas vfs: don't use PARENT/CHILD lock classes for non-directories vfs: pull ext4's double-i_mutex-locking into common code exportfs: fix quadratic behavior in filehandle lookup exportfs: better variable name exportfs: move most of reconnect_path to helper function exportfs: eliminate unused "noprogress" counter exportfs: stop retrying once we race with rename/remove exportfs: clear DISCONNECTED on all parents sooner exportfs: more detailed comment for path_reconnect ...
| * new helper: kfree_put_link()Al Viro2013-10-241-8/+1
| | | | | | | | | | | | duplicated to hell and back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'gfs2-merge-window' of ↵Linus Torvalds2013-11-1119-339/+448
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull gfs2 updates from Steven Whitehouse: "The main feature of interest this time is quota updates. There are some clean ups and some patches to use the new generic lru list code. There is still plenty of scope for some further changes in due course - faster lookups of quota structures is very much on the todo list. Also, a start has been made towards the more tricky issue of using the generic lru code with glocks, but that will have to be completed in a subsequent merge window. The other, more minor feature, is that there have been a number of performance patches which relate to block allocation. In particular they will improve performance when the disk is nearly full" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Use generic list_lru for quota GFS2: Rename quota qd_lru_lock qd_lock GFS2: Use reflink for quota data cache GFS2: Use lockref for glocks GFS2: Protect quota sync generation GFS2: Inline qd_trylock into gfs2_quota_unlock GFS2: Make two similar quota code fragments into a function GFS2: Remove obsolete quota tunable GFS2: Move gfs2_icbit_munge into quota.c GFS2: Speed up starting point selection for block allocation GFS2: Add allocation parameters structure GFS2: Clean up reservation removal GFS2: fix dentry leaks GFS2: new function gfs2_rbm_incr GFS2: Introduce rbm field bii GFS2: Do not reset flags on active reservations GFS2: introduce bi_blocks for optimization GFS2: optimize rbm_from_block wrt bi_start GFS2: d_splice_alias() can't return error
| * GFS2: Use generic list_lru for quotaSteven Whitehouse2013-11-044-66/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using the generic list_lru code, we can now separate the per sb quota list locking from the lru locking. The lru lock is made into the inner-most lock. As a result of this new lock order, we may occasionally see items on the per-sb quota list which are "dead" so that the two places where we traverse that list are updated to take account of that. As a result of this patch, the gfs2 quota shrinker is now NUMA zone aware, and we are also laying the foundations for further improvments in due course. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>
| * GFS2: Rename quota qd_lru_lock qd_lockSteven Whitehouse2013-11-041-35/+35
| | | | | | | | | | | | | | | | | | | | This is a straight forward rename which is in preparation for introducing the generic list_lru infrastructure in the following patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
| * GFS2: Use reflink for quota data cacheSteven Whitehouse2013-11-042-15/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds reflink support to the quota data cache. It looks a bit strange because we still don't have a sensible split in the lookup by id and the lru list. That is coming in later patches though. The intent here is just to swap the current ref count for reflinks in all cases with as little as possible other change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
| * GFS2: Use lockref for glocksSteven Whitehouse2013-10-154-49/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Protect quota sync generationSteven Whitehouse2013-10-043-2/+6
| | | | | | | | | | | | | | | | | | | | Now that gfs2_quota_sync can be potentially called from multiple threads, we should protect this bit of code, and the sync generation number in particular in order to ensure that there are no races when syncing quotas. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
| * GFS2: Inline qd_trylock into gfs2_quota_unlockSteven Whitehouse2013-10-041-25/+20
| | | | | | | | | | | | | | | | | | | | The function qd_trylock was not a trylock despite its name and can be inlined into gfs2_quota_unlock in order to make the code a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
| * GFS2: Make two similar quota code fragments into a functionSteven Whitehouse2013-10-041-34/+26
| | | | | | | | | | | | | | | | | | | | There should be no functional change bar the removal of a test of the MS_READONLY flag which would never be reachable. This merges the common code from qd_fish and qd_trylock into a single function and calls it from both those places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
| * GFS2: Remove obsolete quota tunableSteven Whitehouse2013-10-044-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need for a paramater which relates to the internals of quota to be exposed to users. The only possible use would be to turn it up so large that the memory allocation fails. So lets remove it and set it to a sensible value which ensures that we don't ask for multipage allocations. Currently the size of struct gfs2_holder means that the caluclated value is identical to the previous default value, so there should be no functional change. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
OpenPOWER on IntegriCloud