summaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
* ceph: do not set r_old_dentry_dir on link()Sage Weil2014-04-031-2/+1
| | | | | | | | | | | This is racy--we do not know whather d_parent has changed out from underneath us because i_mutex is not held on the source inode's directory. Also, taking this reference is useless. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: do not assume r_old_dentry[_dir] always set togetherSage Weil2014-04-032-4/+6
| | | | | | | | | Do not assume that r_old_dentry implies that r_old_dentry_dir is also true. Separate out the ref cleanup and make the debugs dump behave when it is NULL. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: do not chain inode updates to parent fsyncSage Weil2014-04-034-17/+5
| | | | | | | | | The fsync(dirfd) only covers namespace operations, not inode updates. We do not need to cover setattr variants or O_TRUNC. Reported-by: Al Viro <viro@xeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: avoid useless ceph_get_dentry_parent_inode() in ceph_rename()Sage Weil2014-04-031-1/+2
| | | | | | | | This is just old_dir; no reason to abuse the dcache pointers. Reported-by: Al Viro <viro.zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: let MDS adjust readdir 'frag'Yan, Zheng2014-04-031-3/+0
| | | | | | | | | | | | If readdir 'frag' is adjusted, readdir 'offset' should be reset. Otherwise some dentries may be lost when readdir and fragmenting directory happen at the some. Another way to fix this issue is let MDS adjust readdir 'frag'. The code that handles MDS reply reset the readdir 'offset' if the readdir reply is different than the requested one. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix reset_readdir()Yan, Zheng2014-04-031-3/+6
| | | | | | | | When changing readdir postion, fi->next_offset should be set to 0 if the new postion is not in the first dirfrag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@linaro.org>
* ceph: fix ceph_dir_llseek()Yan, Zheng2014-04-032-7/+7
| | | | | | | | | | | | | Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for directory. For a fragmented directory, offset (frag_t, off) can be larger than inode->i_sb->s_maxbytes. At the very beginning of ceph_dir_llseek(), local variable old_offset is initialized to parameter offset. This doesn't make sense neither. Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@linaro.org>
* ceph: fix __dcache_readdir()Yan, Zheng2014-02-171-1/+9
| | | | | | | | | | | | If directory is fragmented, readdir() read its dirfrags one by one. After reading all dirfrags, the corresponding dentries are sorted in (frag_t, off) order in the dcache. If dentries of a directory are all cached, __dcache_readdir() can use the cached dentries to satisfy readdir syscall. But when checking if a given dentry is after the position of readdir, __dcache_readdir() compares numerical value of frag_t directly. This is wrong, it should use ceph_frag_compare(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: add acl, noacl options for cephfs mountSage Weil2014-02-171-4/+28
| | | | | | | | | Make the 'acl' option dependent on having ACL support compiled in. Make the 'noacl' option work even without it so that one can always ask it to be off and not error out on mount when it is not supported. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: make ceph_forget_all_cached_acls() static inlineGuangliang Zhao2014-02-172-6/+6
| | | | | | Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: add missing init_acl() for mkdir() and atomic_open()Yan, Zheng2014-02-172-5/+9
| | | | Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix ceph_set_acl()Yan, Zheng2014-02-171-5/+1
| | | | | | | | | If acl is equivalent to file mode permission bits, ceph_set_acl() needs to remove any existing acl xattr. Use __ceph_setxattr() to handle both setting and removing acl xattr cases, it doesn't return -ENODATA when there is no acl xattr. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix ceph_removexattr()Yan, Zheng2014-02-171-1/+1
| | | | Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: remove xattr when null value is given to setxattr()Yan, Zheng2014-02-171-2/+14
| | | | | | | For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE to distinguish null value case from the zero-length value case. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: properly handle XATTR_CREATE and XATTR_REPLACEYan, Zheng2014-02-171-12/+26
| | | | | | | return -EEXIST if XATTR_CREATE is set and xattr alread exists. return -ENODATA if XATTR_REPLACE is set but xattr does not exist. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix missing dput in ceph_set_aclSage Weil2014-01-311-3/+6
| | | | | | | | | | | | | Add matching dput() for d_find_alias(). Move d_find_alias() down a bit at Julia's suggestion. [ Introduced by commit 72466d0b92e0: "ceph: fix posix ACL hooks" ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ceph: simplify ceph_{get,init}_aclChristoph Hellwig2014-01-301-40/+16
| | | | | | | | | | | - ->get_acl only gets called after we checked for a cached ACL, so no need to call get_cached_acl again. - no need to check IS_POSIXACL in ->get_acl, without that it should never get set as all the callers that set it already have the check. - you should be able to use the full posix_acl_create in CEPH Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: remove duplicate declaration of ceph_setattrPeter Rosin2014-01-301-1/+0
| | | | | Signed-off-by: Peter Rosin <peda@lysator.liu.se> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: fix posix ACL hooksSage Weil2014-01-294-5/+10
| | | | | | | | | | | | | | | The merge of commit 7221fe4c2ed7 ("ceph: add acl for cephfs") raced with upstream changes in the generic POSIX ACL code (eg commit 2aeccbe957d0 "fs: add generic xattr_acl handlers" and others). Some of the fallout was fixed in commit 4db658ea0ca ("ceph: Fix up after semantic merge conflict"), but it was incomplete: the set_acl inode_operation wasn't getting set, and the prototype needed to be adjusted a bit (it doesn't take a dentry anymore). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ceph: Fix up after semantic merge conflictLinus Torvalds2014-01-284-108/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous ceph-client merge resulted in ceph not even building, because there was a merge conflict that wasn't visible as an actual data conflict: commit 7221fe4c2ed7 ("ceph: add acl for cephfs") added support for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change a lot of the POSIX ACL helper functions to be much more helpful to filesystems (see for example commits 2aeccbe957d0 "fs: add generic xattr_acl handlers", 5bf3258fd2ac "fs: make posix_acl_chmod more useful" and 37bc15392a23 "fs: make posix_acl_create more useful") The reason this conflict wasn't obvious was many-fold: because it was a semantic conflict rather than a data conflict, it wasn't visible in the git merge as a conflict. And because the VFS tree hadn't been in linux-next, people hadn't become aware of it that way. And because I was at jury duty this morning, I was using my laptop and as a result not doing constant "allmodconfig" builds. Anyway, this fixes the build and generally removes a fair chunk of the Ceph POSIX ACL support code, since the improved helpers seem to match really well for Ceph too. But I don't actually have any way to *test* the end result, and I was really hoping for some ACK's for this. Oh, well. Not compiling certainly doesn't make things easier to test, so I'm committing this without the acks after having waited for four hours... Plus it's what I would have done for the merge had I noticed the semantic conflict.. Reported-by: Dave Jones <davej@redhat.com> Cc: Sage Weil <sage@inktank.com> Cc: Guangliang Zhao <lucienchao@gmail.com> Cc: Li Wang <li.wang@ubuntykylin.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ceph: cast PAGE_SIZE to size_t in ceph_sync_write()Ilya Dryomov2014-01-281-1/+1
| | | | | | | | | Use min_t(size_t, ...) instead of plain min(), which does strict type checking, to avoid compile warning on i386. Cc: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: fix dout() compile warnings in ceph_filemap_fault()Ilya Dryomov2014-01-281-3/+3
| | | | | | | | | | | PAGE_CACHE_SIZE is unsigned long on all architectures, however size_t is either unsigned int or unsigned long. Rather than change format strings, cast PAGE_CACHE_SIZE to size_t to be in line with dout()s in ceph_page_mkwrite(). Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* libceph: replace ceph_calc_ceph_pg() with ceph_oloc_oid_to_pg()Ilya Dryomov2014-01-271-2/+6
| | | | | | | | Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: add imported caps when handling cap export messageYan, Zheng2014-01-213-82/+146
| | | | | | | | | | | Version 3 cap export message includes information about the imported caps. It allows us to add the imported caps if the corresponding cap import message still hasn't been received. This allow us to handle situation that the importer MDS crashes and the cap import message is missing. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: add open export target session helperYan, Zheng2014-01-212-15/+38
| | | | Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: remove exported caps when handling cap import messageYan, Zheng2014-01-211-27/+52
| | | | | | | | | | | Version 3 cap import message includes the ID of the exported caps. It allow us to remove the exported caps if we still haven't received the corresponding cap export message. We remove the exported caps because they are stale, keeping them can compromise consistence. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: handle session flush messageYan, Zheng2014-01-212-0/+21
| | | | Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: check inode caps in ceph_d_revalidateYan, Zheng2014-01-213-3/+21
| | | | | | | | | Some inodes in readdir reply may have no caps. Getattr mds request for these inodes can return -ESTALE. The fix is consider dentry that links to inode with no caps as invalid. Invalid dentry causes a lookup request to send to the mds, the MDS will send caps back. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: handle -ESTALE replyYan, Zheng2014-01-211-20/+11
| | | | | | | | | Send requests that operate on path to directory's auth MDS if mode == USE_AUTH_MDS. Always retry using the auth MDS if got -ESTALE reply from non-auth MDS. Also clean up the code that handles auth MDS change. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix trim capsYan, Zheng2014-01-211-6/+11
| | | | | | | | - don't trim auth cap if there are flusing caps - don't trim auth cap if any 'write' cap is wanted - allow trimming non-auth cap even if the inode is dirty Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: fix cache revoke raceYan, Zheng2014-01-213-4/+8
| | | | | | | | | | | | handle following sequence of events: - non-auth MDS revokes Fc cap. queue invalidate work - auth MDS issues Fc cap through request reply. i_rdcache_gen gets increased. - invalidate work runs. it finds i_rdcache_revoking != i_rdcache_gen, so it does nothing. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: use ceph_seq_cmp() to compare migrate_seqYan, Zheng2014-01-211-1/+1
| | | | Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: handle cap export race in try_flush_caps()Yan, Zheng2014-01-211-8/+8
| | | | | | auth cap may change after releasing the i_ceph_lock Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: trivial comment fixJ. Bruce Fields2014-01-161-3/+3
| | | | | | | | "disconnected" is too easily confused with "DCACHE_DISCONNECTED". I think "unhashed" is the more precise term here. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Reviewed-by: Sage Weil <sage@inktank.com>
* libceph: all features fields must be u64Ilya Dryomov2013-12-312-9/+9
| | | | | | | | | In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph fscache: Uncaching no data page from fscache in readpage()Li Wang2013-12-311-0/+1
| | | | | | | | | Currently, if one new page allocated into fscache in readpage(), however, with no data read into due to error encountered during reading from OSDs, the slot in fscache is not uncached. This patch fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com>
* ceph fscache: Introduce a routine for uncaching single no data page from fscacheLi Wang2013-12-311-0/+13
| | | | | Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com>
* ceph: add acl for cephfsGuangliang Zhao2013-12-319-13/+451
| | | | | | Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Li Wang <li.wang@ubuntykylin.com> Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
* ceph: check caps in filemap_fault and page_mkwriteYan, Zheng2013-12-311-12/+77
| | | | | | | | | | Adds cap check to the page fault handler. The check prevents page fault handler from adding new page to the page cache while Fcb caps are being revoked. This solves Fc revoking hang in multiple clients mmap IO workload. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
* fs: ceph: new helper: file_inode(file)Libo Chen2013-12-131-1/+1
| | | | | Signed-off-by: Libo Chen <clbchenlibo.chen@huawei.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: Clean up if error occurred in finish_read()Li Wang2013-12-131-0/+3
| | | | | | | | Clean up if error occurred rather than going through normal process Signed-off-by: Li Wang <liwang@ubuntukylin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: implement readv/preadv for sync operationmajianpeng2013-12-131-46/+116
| | | | | | | | For readv/preadv sync-operatoin, ceph only do the first iov. Now implement this. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
* ceph: Implement writev/pwritev for sync operation.majianpeng2013-12-131-80/+193
| | | | | | | | | | | | | For writev/pwritev sync-operatoin, ceph only do the first iov. I divided the write-sync-operation into two functions. One for direct-write, other for none-direct-sync-write. This is because for none-direct-sync-write we can merge iovs to one. But for direct-write, we can't merge iovs. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: drop unconnected inodesYan, Zheng2013-12-133-0/+12
| | | | | | | | Positve dentry and corresponding inode are always accompanied in MDS reply. So no need to keep inode in the cache after dropping all its aliases. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
* ceph: Avoid data inconsistency due to d-cache aliasing in readpage()Li Wang2013-12-131-2/+6
| | | | | | | | | | If the length of data to be read in readpage() is exactly PAGE_CACHE_SIZE, the original code does not flush d-cache for data consistency after finishing reading. This patches fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Signed-off-by: Sage Weil <sage@inktank.com>
* ceph: initialize inode before instantiating dentryYan, Zheng2013-12-131-78/+58
| | | | | | | | | | | commit b18825a7c8 (Put a small type field into struct dentry::d_flags) put a type field into struct dentry::d_flags. __d_instantiate() set the field by checking inode->i_mode. So we should initialize inode before instantiating dentry when handling mds reply. Fixes: http://tracker.ceph.com/issues/6930 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
* Merge branch 'for-linus-bugs' of ↵Linus Torvalds2013-11-268-41/+121
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph bug-fixes from Sage Weil: "These include a couple fixes to the new fscache code that went in during the last cycle (which will need to go stable@ shortly as well), a couple client-side directory fragmentation fixes, a fix for a race in the cap release queuing path, and a couple race fixes in the request abort and resend code. Obviously some of this could have gone into 3.12 final, but I preferred to overtest rather than send things in for a late -rc, and then my travel schedule intervened" * 'for-linus-bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: allocate non-zero page to fscache in readpage() ceph: wake up 'safe' waiters when unregistering request ceph: cleanup aborted requests when re-sending requests. ceph: handle race between cap reconnect and cap release ceph: set caps count after composing cap reconnect message ceph: queue cap release in __ceph_remove_cap() ceph: handle frag mismatch between readdir request and reply ceph: remove outdated frag information ceph: hung on ceph fscache invalidate in some cases
| * ceph: allocate non-zero page to fscache in readpage()Li Wang2013-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | ceph_osdc_readpages() returns number of bytes read, currently, the code only allocate full-zero page into fscache, this patch fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: wake up 'safe' waiters when unregistering requestYan, Zheng2013-11-231-1/+2
| | | | | | | | | | | | | | | | We also need to wake up 'safe' waiters if error occurs or request aborted. Otherwise sync(2)/fsync(2) may hang forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph: cleanup aborted requests when re-sending requests.Yan, Zheng2013-11-231-1/+4
| | | | | | | | | | | | | | | | | | | | Aborted requests usually get cleared when the reply is received. If MDS crashes, no reply will be received. So we need to cleanup aborted requests when re-sending requests. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
OpenPOWER on IntegriCloud