summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* jfs: Switch to generic xattr handlersAndreas Gruenbacher2016-05-125-116/+84
| | | | | | | | | | | | | This is mostly the same as on other filesystems except for attribute names with an "os2." prefix: for those, the prefix is not stored on disk, and on-attribute names without a prefix have "os2." added. As on several other filesystems, the underlying function for setting/removing xattrs (__jfs_setxattr) removes attributes when the value is NULL, so the set xattr handlers will work as expected. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jfs: Clean up xattr name mappingAndreas Gruenbacher2016-05-121-55/+25
| | | | | | | | | Instead of stripping "os2." prefixes in __jfs_setxattr, make callers strip them, as __jfs_getxattr already does. With that change, use the same name mapping function in jfs_{get,set,remove}xattr. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* gfs2: Switch to generic xattr handlersAl Viro2016-05-124-88/+99
| | | | | | | | | | | | | | | Switch to the generic xattr handlers and take the necessary glocks at the layer below. The following are the new xattr "entry points"; they are called with the glock held already in the following cases: gfs2_xattr_get: From SELinux, during lookups. gfs2_xattr_set: The glock is never held. gfs2_get_acl: From gfs2_create_inode -> posix_acl_create and gfs2_setattr -> posix_acl_chmod. gfs2_set_acl: From gfs2_setattr -> posix_acl_chmod. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: kill __ceph_removexattr()Yan, Zheng2016-04-231-126/+0
| | | | | | | | | when removing a xattr, generic_removexattr() calls __ceph_setxattr() with NULL value and XATTR_REPLACE flag. __ceph_removexattr() is not used any more. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: Switch to generic xattr handlersAndreas Gruenbacher2016-04-234-52/+38
| | | | | | | | | | | | | | | | | | | | Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check for valid attribute names there, and remove those checks from __ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be handled by the catch-all handler anymore. The set xattr handler is called with a NULL value to indicate that the attribute should be removed; __ceph_setxattr already handles that case correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL value). Move the check for snapshots from ceph_{set,remove}xattr into __ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be replaced with the generic iops. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: Get rid of d_find_alias in ceph_set_aclAndreas Gruenbacher2016-04-234-33/+29
| | | | | | | | | | | Create a variant of ceph_setattr that takes an inode instead of a dentry. Change __ceph_setxattr (and also __ceph_removexattr) to take an inode instead of a dentry. Use those in ceph_set_acl so that we no longer need a dentry there. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ->getxattr(): pass dentry and inode as separate argumentsAl Viro2016-04-1134-85/+94
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* xattr_handler: pass dentry and inode as separate arguments of ->get()Al Viro2016-04-1031-114/+113
| | | | | | ... and do not assume they are already attached to each other Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs: switch to generic_{get,set,remove}xattr()Al Viro2016-04-107-98/+31
| | | | | | | | | | | | | | | | reiserfs_xattr_[sg]et() will fail with -EOPNOTSUPP for V1 inodes anyway, and all reiserfs instances of ->[sg]et() call it and so does ->set_acl(). Checks for name length in the instances had been bogus; they should've been "bugger off if it's _exactly_ the prefix" (as generic would do on its own) and not "bugger off if it's shorter than the prefix" - that can't happen. xattr_full_name() is needed to adjust for the fact that generic instances will skip the prefix in the name passed to ->[gs]et(); reiserfs homegrown analogues didn't. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cifs: kill more bogus checks in ->...xattr() methodsAl Viro2016-04-101-36/+6
| | | | | | | none of that stuff can ever be called for NULL or negative dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro2016-04-1030-50/+41
| | | | | | ... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* posix_acl: Unexport acl_by_type and make it staticAndreas Gruenbacher2016-03-312-3/+1
| | | | | | | | | | acl_by_type(inode, type) returns a pointer to either inode->i_acl or inode->i_default_acl depending on type. This is useful in fs/posix_acl.c, but should never have been visible outside that file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* posix_acl: Inode acl caching fixesAndreas Gruenbacher2016-03-3117-78/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When get_acl() is called for an inode whose ACL is not cached yet, the get_acl inode operation is called to fetch the ACL from the filesystem. The inode operation is responsible for updating the cached acl with set_cached_acl(). This is done without locking at the VFS level, so another task can call set_cached_acl() or forget_cached_acl() before the get_acl inode operation gets to calling set_cached_acl(), and then get_acl's call to set_cached_acl() results in caching an outdate ACL. Prevent this from happening by setting the cached ACL pointer to a task-specific sentinel value before calling the get_acl inode operation. Move the responsibility for updating the cached ACL from the get_acl inode operations to get_acl(). There, only set the cached ACL if the sentinel value hasn't changed. The sentinel values are chosen to have odd values. Likewise, the value of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have an even value (ACLs are aligned in memory). This allows to distinguish uncached ACLs values from ACL objects. In addition, switch from guarding inode->i_acl and inode->i_default_acl upates by the inode->i_lock spinlock to using xchg() and cmpxchg(). Filesystems that do not want ACLs returned from their get_acl inode operations to be cached must call forget_cached_acl() to prevent the VFS from doing so. (Patch written by Al Viro and Andreas Gruenbacher.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* jfs: Remove unnecessary code in jfs_get_aclAndreas Gruenbacher2016-03-281-4/+0
| | | | | | | | | The get_acl inode operation is called only when no ACL is cached. It makes no sense to check for a cached ACL as the first thing inside such inode operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reiserfs_cache_default_acl(): use get_acl()Al Viro2016-03-281-1/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Linux 4.6-rc1v4.6-rc1Linus Torvalds2016-03-261-2/+2
|
* Merge branch 'for-linus' of ↵Linus Torvalds2016-03-2622-519/+811
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...
| * libceph: use KMEM_CACHE macroGeliang Tang2016-03-251-8/+2
| | | | | | | | | | | | | | Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: use kmem_cache_zallocGeliang Tang2016-03-252-2/+2
| | | | | | | | | | | | | | Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: use KMEM_CACHE macroGeliang Tang2016-03-251-8/+2
| | | | | | | | | | | | | | Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: use lookup request to revalidate dentryYan, Zheng2016-03-252-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If dentry has no lease, ceph_d_revalidate() previously return 0. This causes VFS to invalidate the dentry and create a new dentry for later lookup. Invalidating a dentry also detach any underneath mount points. So mount point inside cephfs can disapear mystically (even the mount point is not modified by other hosts). The fix is using lookup request to revalidate dentry without lease. This can partly solve the mount points disapear issue (as long as the mount point is not modified by other hosts) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: kill ceph_get_dentry_parent_inode()Yan, Zheng2016-03-252-20/+5
| | | | | | | | | | | | use vfs helper dget_parent() instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix security xattr deadlockYan, Zheng2016-03-258-11/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't request vxattrs from MDSYan, Zheng2016-03-251-2/+4
| | | | | | | | | | | | It's uselese because MDS reply does not carry any vxattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix mounting same fs multiple timesYan, Zheng2016-03-251-18/+15
| | | | | | | | | | | | | | Now __ceph_open_session() only accepts closed client. An opened client will tigger BUG_ON(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove unnecessary NULL checkYan, Zheng2016-03-251-2/+2
| | | | | | | | | | | | | | | | If page->mapping is NULL, releasepage() callback does not get called. Remove the unnecessary NULL check to make static code analysis tool happy Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: avoid updating directory inode's i_size accidentallyYan, Zheng2016-03-251-0/+4
| | | | | | | | | | | | Directory inode's i_size is used by readdir cache. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix race during filling readdir cacheYan, Zheng2016-03-251-2/+7
| | | | | | | | | | | | | | | | | | Readdir cache uses page cache to save dentry pointers. When adding dentry pointers to middle of a page, we need to make sure the page already exists. Otherwise the beginning part of the page will be invalid pointers. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: use sizeof_footer() moreIlya Dryomov2016-03-251-16/+3
| | | | | | | | | | | | | | | | | | Don't open-code sizeof_footer() in read_partial_message() and ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now possible to use con_out_kvec_add() in prepare_write_message_footer(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * ceph: kill ceph_empty_snapcIlya Dryomov2016-03-254-34/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_empty_snapc->num_snaps == 0 at all times. Passing such a snapc to ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only for sizing the request message. Further, in all four cases the subsequent ceph_osdc_build_request() is passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps and making ceph_empty_snapc entirely useless. The two cases where it actually mattered were removed in commits 860560904962 ("ceph: avoid sending unnessesary FLUSHSNAP message") and 23078637e054 ("ceph: fix queuing inode to mdsdir's snaprealm"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix a wrong comparisonAnton Protopopov2016-03-251-1/+1
| | | | | | | | | | | | | | | | A negative value rc compared to the positive value ENOENT in the finish_read() function. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: replace CURRENT_TIME by current_fs_time()Deepa Dinamani2016-03-254-6/+6
| | | | | | | | | | | | | | | | | | CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: scattered page writebackYan, Zheng2016-03-251-109/+196
| | | | | | | | | | | | | | | | | | | | This patch makes ceph_writepages_start() try using single OSD request to write all dirty pages within a strip unit. When a nonconsecutive dirty page is found, ceph_writepages_start() tries starting a new write operation to existing OSD request. If it succeeds, it uses the new operation to writeback the dirty page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: add helper that duplicates last extent operationYan, Zheng2016-03-252-0/+24
| | | | | | | | | | | | | | | | | | | | This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: enable large, variable-sized OSD requestsIlya Dryomov2016-03-253-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: osdc->req_mempool should be backed by a slab poolIlya Dryomov2016-03-251-2/+2
| | | | | | | | | | | | | | | | ceph_osd_request_cache was introduced a long time ago. Also, osd_req is about to get a flexible array member, which ceph_osd_request_cache is going to be aware of. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: make r_request msg_size calculation clearerIlya Dryomov2016-03-251-10/+11
| | | | | | | | | | | | | | | | | | | | Although msg_size is calculated correctly, the terms are grouped in a misleading way - snaps appears to not have room for a u32 length. Move calculation closer to its use and regroup terms. No functional change. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: move r_reply_op_{len,result} into struct ceph_osd_req_opYan, Zheng2016-03-253-5/+6
| | | | | | | | | | | | | | | | This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: rename ceph_osd_req_op::payload_len to indata_lenIlya Dryomov2016-03-252-7/+7
| | | | | | | | | | | | | | Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: remove useless BUG_ONYan, Zheng2016-03-251-2/+0
| | | | | | | | | | | | ceph_osdc_start_request() never return -EOLDSNAP Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't enable rbytes mount option by defaultYan, Zheng2016-03-252-4/+3
| | | | | | | | | | | | | | | | When rbytes mount option is enabled, directory size is recursive size. Recursive size is not updated instantly. This can cause directory size to change between successive stat(1) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: encode ctime in cap messageYan, Zheng2016-03-251-4/+7
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: behave in mon_fault() if cur_mon < 0Ilya Dryomov2016-03-251-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | This can happen if __close_session() in ceph_monc_stop() races with a connection reset. We need to ignore such faults, otherwise it's likely we would take !hunting, call __schedule_delayed() and end up with delayed_work() executing on invalid memory, among other things. The (two!) con->private tests are useless, as nothing ever clears con->private. Nuke them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: reschedule tick in mon_fault()Ilya Dryomov2016-03-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Doing __schedule_delayed() in the hunting branch is pointless, as the tick will have already been scheduled by then. What we need to do instead is *reschedule* it in the !hunting branch, after reopen_session() changes hunt_mult, which affects the delay. This helps with spacing out connection attempts and avoiding things like two back-to-back attempts followed by a longer period of waiting around. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: introduce and switch to reopen_session()Ilya Dryomov2016-03-251-17/+16
| | | | | | | | | | | | | | | | hunting is now set in __open_session() and cleared in finish_hunting(), instead of all around. The "session lost" message is printed not only on connection resets, but also on keepalive timeouts. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: monc hunt rate is 3s with backoff up to 30sIlya Dryomov2016-03-253-9/+22
| | | | | | | | | | | | | | | | | | | | | | Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: monc ping rate is 10sIlya Dryomov2016-03-253-9/+5
| | | | | | | | | | | | | | | | | | | | | | Split ping interval and ping timeout: ping interval is 10s; keepalive timeout is 30s. Make monc_ping_timeout a constant while at it - it's not actually exported as a mount option (and the rest of tick-related settings won't be either), so it's got no place in ceph_options. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: pick a different monitor when reconnectingIlya Dryomov2016-03-251-28/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | Don't try to reconnect to the same monitor when we fail to establish a session within a timeout or it's lost. For that, pick_new_mon() needs to see the old value of cur_mon, so don't clear it in __close_session() - all calls to __close_session() but one are followed by __open_session() anyway. __open_session() is only called when a new session needs to be established, so the "already open?" branch, which is now in the way, is simply dropped. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: revamp subs code, switch to SUBSCRIBE2 protocolIlya Dryomov2016-03-258-95/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: decouple hunting and subs managementIlya Dryomov2016-03-251-9/+22
| | | | | | | | | | | | | | Coupling hunting state with subscribe state is not a good idea. Clear hunting when we complete the authentication handshake. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
OpenPOWER on IntegriCloud