summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ceph: do not include cap/dentry releases in replayed messagesSage Weil2010-07-162-0/+9
| | | | | | | | | Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: reuse request message when replaying against recovering mdsSage Weil2010-07-161-5/+22
| | | | | | | | | | | | | | | | Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix creation of ipv6 socketsSage Weil2010-07-091-3/+5
| | | | | | Use the address family from the peer address instead of assuming IPv4. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix parsing of ipv6 addressesSage Weil2010-07-091-6/+19
| | | | | | | Check for brackets around the ipv6 address to avoid ambiguity with the port number. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix printing of ipv6 addrsSage Weil2010-07-081-18/+6
| | | | | | | | The buffer was too small. Make it bigger, use snprintf(), put brackets around the ipv6 address to avoid mixing it up with the :port, and use the ever-so-handy %pI[46] formats. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add kfree() to error pathDan Carpenter2010-07-081-0/+1
| | | | | | | We leak a "pi" on this error path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix leak of mon authorizerSage Weil2010-07-051-0/+3
| | | | | | Fix leak of a struct ceph_buffer on umount. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix message revocationSage Weil2010-07-051-7/+7
| | | | | | | | | | | | | A message can be on a queue (pending or sent), or out_msg (sending), or both. We were assuming that if it's not on a queue it couldn't be out_msg, but that was false in the case of lossy connections like the OSD. Fix ceph_con_revoke() to treat these cases independently. Also, fix the out_kvec_is_message check to only trigger if we are currently sending _this_ message. This fixes a GPF in tcp_sendpage, triggered by OSD restarts. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush device 'out' threshold to 1.0, not 0.1Sage Weil2010-07-051-1/+1
| | | | | | | Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix caps usage accounting for import (non-reserved) caseSage Weil2010-06-291-2/+8
| | | | | | | We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only release clean, unused caps with mds requestsSage Weil2010-06-291-5/+6
| | | | | | | | | | | We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush CHOOSE_LEAF when type is already a leafSage Weil2010-06-241-13/+25
| | | | | | | We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush recursionSage Weil2010-06-241-0/+1
| | | | | | | There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix caps debugfs entryYehuda Sadeh2010-06-241-1/+1
| | | | | | | The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: delay umount until all mds requests drop inode+dentry refsSage Weil2010-06-211-0/+6
| | | | | | | | | | | | | | | | | This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulateSage Weil2010-06-211-7/+12
| | | | | | | Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix crush map update decodingSage Weil2010-06-171-0/+1
| | | | | | | If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix message memory leak, uninitialized variableSage Weil2010-06-131-0/+2
| | | | | | | | | We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix map handler error pathSage Weil2010-06-131-1/+2
| | | | | | Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: some endianity fixesYehuda Sadeh2010-06-133-3/+4
| | | | | | | Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-06-119-17/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: The file argument for fsync() is never null Btrfs: handle ERR_PTR from posix_acl_from_xattr() Btrfs: avoid BUG when dropping root and reference in same transaction Btrfs: prohibit a operation of changing acl's mask when noacl mount option used Btrfs: should add a permission check for setfacl Btrfs: btrfs_lookup_dir_item() can return ERR_PTR Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs Btrfs: unwind after btrfs_start_transaction() errors Btrfs: btrfs_iget() returns ERR_PTR Btrfs: handle kzalloc() failure in open_ctree() Btrfs: handle error returns from btrfs_lookup_dir_item() Btrfs: Fix BUG_ON for fs converted from extN Btrfs: Fix null dereference in relocation.c Btrfs: fix remap_file_pages error Btrfs: uninitialized data is check_path_shared() Btrfs: fix fallocate regression Btrfs: fix loop device on top of btrfs
| * Btrfs: The file argument for fsync() is never nullDan Carpenter2010-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | The "file" argument for fsync is never null so we can remove this check. What drew my attention here is that 7ea8085910e: "drop unused dentry argument to ->fsync" introduced an unconditional dereference at the start of the function and that generated a smatch warning. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: handle ERR_PTR from posix_acl_from_xattr()Dan Carpenter2010-06-111-0/+2
| | | | | | | | | | | | | | | | posix_acl_from_xattr() returns both ERR_PTRs and null, but it's OK to pass null values to set_cached_acl() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: avoid BUG when dropping root and reference in same transactionSage Weil2010-06-111-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes with end_transaction(), the cleaner kthread may come in and drop the root in the same transaction. If that's the case, the root's refs still == 1 in the tree when btrfs_del_root() deletes the item, because commit_fs_roots() hasn't updated it yet (that happens during the commit). This wasn't a problem before only because btrfs_ioctl_snap_destroy() would commit the transaction before dropping the dentry reference, so the dead root wouldn't get queued up until after the fs root item was updated in the btree. Since it is not an error to drop the root reference and the root in the same transaction, just drop the BUG_ON() in btrfs_del_root(). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: prohibit a operation of changing acl's mask when noacl mount option usedShi Weihua2010-06-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when used Posix File System Test Suite(pjd-fstest) to test btrfs, some cases about setfacl failed when noacl mount option used. I simplified used commands in pjd-fstest, and the following steps can reproduce it. ------------------------ # cd btrfs-part/ # mkdir aaa # setfacl -m m::rw aaa <- successed, but not expected by pjd-fstest. ------------------------ I checked ext3, a warning message occured, like as: setfacl: aaa/: Operation not supported Certainly, it's expected by pjd-fstest. So, i compared acl.c of btrfs and ext3. Based on that, a patch created. Fortunately, it works. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: should add a permission check for setfaclShi Weihua2010-06-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On btrfs, do the following ------------------ # su user1 # cd btrfs-part/ # touch aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rw- group::rw- other::r-- # su user2 # cd btrfs-part/ # setfacl -m u::rwx aaa # getfacl aaa # file: aaa # owner: user1 # group: user1 user::rwx <- successed to setfacl group::rw- other::r-- ------------------ but we should prohibit it that user2 changing user1's acl. In fact, on ext3 and other fs, a message occurs: setfacl: aaa: Operation not permitted This patch fixed it. Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: btrfs_lookup_dir_item() can return ERR_PTRDan Carpenter2010-06-111-1/+1
| | | | | | | | | | | | | | btrfs_lookup_dir_item() can return either ERR_PTRs or null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRsDan Carpenter2010-06-111-0/+4
| | | | | | | | | | | | | | | | | | btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a check for that. It's not clear to me if it can also return NULL pointers or not so I left the original NULL pointer check as is. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: unwind after btrfs_start_transaction() errorsDan Carpenter2010-06-111-1/+1
| | | | | | | | | | | | | | | | | | This was added by a22285a6a3: "Btrfs: Integrate metadata reservation with start_transaction". If we goto out here then we skip all the unwinding and there are locks still held etc. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: btrfs_iget() returns ERR_PTRDan Carpenter2010-06-111-2/+2
| | | | | | | | | | | | | | btrfs_iget() returns an ERR_PTR() on failure and not null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: handle kzalloc() failure in open_ctree()Dan Carpenter2010-06-111-2/+5
| | | | | | | | | | | | | | Unwind and return -ENOMEM if the allocation fails here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: handle error returns from btrfs_lookup_dir_item()Dan Carpenter2010-06-111-0/+2
| | | | | | | | | | | | | | | | If btrfs_lookup_dir_item() fails, we should can just let the mount fail with an error. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix BUG_ON for fs converted from extNYan, Zheng2010-06-111-1/+2
| | | | | | | | | | | | | | | | Tree blocks can live in data block groups in FS converted from extN. So it's easy to trigger the BUG_ON. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix null dereference in relocation.cYan, Zheng2010-06-111-3/+4
| | | | | | | | | | | | | | | | Fix a potential null dereference in relocation.c Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Acked-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix remap_file_pages errorMiao Xie2010-06-111-1/+8
| | | | | | | | | | | | | | | | when we use remap_file_pages() to remap a file, remap_file_pages always return error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: uninitialized data is check_path_shared()Dan Carpenter2010-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | refs can be used with uninitialized data if btrfs_lookup_extent_info() fails on the first pass through the loop. In the original code if that happens then check_path_shared() probably returns 1, this patch changes it to return 1 for safety. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix fallocate regressionJosef Bacik2010-06-111-1/+1
| | | | | | | | | | | | | | | | | | Seems that when btrfs_fallocate was converted to use the new ENOSPC stuff we dropped passing the mode to the function that actually does the preallocation. This breaks anybody who wants to use FALLOC_FL_KEEP_SIZE. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix loop device on top of btrfsMiao Xie2010-06-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot use the loop device which has been connected to a file in the btrf The reproduce steps is following: # dd if=/dev/zero of=vdev0 bs=1M count=1024 # losetup /dev/loop0 vdev0 # mkfs.btrfs /dev/loop0 ... failed to zero device start -5 The reason is that the btrfs don't implement either ->write_begin or ->write the VFS API, so we fix it by setting ->write to do_sync_write(). Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-06-116-51/+84
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: try to send partial cap release on cap message on missing inode ceph: release cap on import if we don't have the inode ceph: fix misleading/incorrect debug message ceph: fix atomic64_t initialization on ia64 ceph: fix lease revocation when seq doesn't match ceph: fix f_namelen reported by statfs ceph: fix memory leak in statfs ceph: fix d_subdirs ordering problem
| * | ceph: try to send partial cap release on cap message on missing inodeSage Weil2010-06-103-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have enough memory to allocate a new cap release message, do so, so that we can send a partial release message immediately. This keeps us from making the MDS wait when the cap release it needs is in a partially full release message. If we fail because of ENOMEM, oh well, they'll just have to wait a bit longer. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: release cap on import if we don't have the inodeSage Weil2010-06-103-38/+61
| | | | | | | | | | | | | | | | | | | | | | | | If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix misleading/incorrect debug messageSage Weil2010-06-101-1/+1
| | | | | | | | | | | | | | | | | | Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix atomic64_t initialization on ia64Jeff Mahoney2010-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix lease revocation when seq doesn't matchSage Weil2010-06-041-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | If the client revokes a lease with a higher seq than what we have, keep the mds's seq, so that it honors our release. Otherwise, we can hang indefinitely. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix f_namelen reported by statfsSage Weil2010-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix memory leak in statfsYehuda Sadeh2010-06-011-0/+2
| | | | | | | | | | | | | | | | | | | | | Freeing the statfs request structure when required. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: fix d_subdirs ordering problemHenry C Chang2010-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | pipe: fix check in "set size" fcntlMiklos Szeredi2010-06-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it stands this check compares the number of pages to the page size. This makes no sense and makes the fcntl fail in almost any sane case. Fix it by checking if nr_pages is not zero (it can become zero only if arg is too big and round_pipe_size() overflows). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | pipe: fix pipe buffer resizingMiklos Szeredi2010-06-101-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pipe_set_size() needs to copy pipe bufs from the old circular buffer to the new. The current code gets this wrong in multiple ways, resulting in oops. Test program is available here: http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/ Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | block: remove duplicate BUG_ON() in bd_finish_claiming()Jens Axboe2010-06-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | We do the same BUG_ON() just a line later when calling into __bd_abort_claiming(). Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
OpenPOWER on IntegriCloud