summaryrefslogtreecommitdiffstats
path: root/fs/ceph/super.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2012-01-131-1/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: ensure prealloc_blob is in place when removing xattr rbd: initialize snap_rwsem in rbd_add() ceph: enable/disable dentry complete flags via mount option vfs: export symbol d_find_any_alias() ceph: always initialize the dentry in open_root_dentry() libceph: remove useless return value for osd_client __send_request() ceph: avoid iput() while holding spinlock in ceph_dir_fsync ceph: avoid useless dget/dput in encode_fh ceph: dereference pointer after checking for NULL crush: fix force for non-root TAKE ceph: remove unnecessary d_fsdata conditional checks ceph: Use kmemdup rather than duplicating its implementation Fix up conflicts in fs/ceph/super.c (d_alloc_root() failure handling vs always initialize the dentry in open_root_dentry)
| * ceph: enable/disable dentry complete flags via mount optionSage Weil2012-01-121-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Enable/disable use of the dentry dir 'complete' flag via a mount option. This lets the admin control whether ceph uses the dcache to satisfy negative lookups or readdir when it has the entire directory contents in its cache. This is purely a performance optimization; correctness is guaranteed whether it is enabled or not. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: always initialize the dentry in open_root_dentry()Alex Elder2012-01-111-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When open_root_dentry() gets a dentry via d_obtain_alias() it does not get initialized. If the dentry obtained came from the cache, this is OK. But if not, the result is an improperly initialized dentry. To fix this, call ceph_init_dentry() regardless of which path produced the dentry. That function returns immediately for a dentry that is already initialized, it is safe to use either way. (Credit to Sage, who suggested this fix.) Signed-off-by: Alex Elder <aelder@sgi.com>
* | ceph: d_alloc_root() may failAl Viro2012-01-091-4/+11
| | | | | | | | | | | | ... and ceph_init_dentry(NULL) will oops Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | vfs: switch ->show_options() to struct dentry *Al Viro2012-01-061-3/+3
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: fix rasize reporting by ceph_show_optionsSage Weil2011-12-021-1/+1
| | | | | | | Fix typo. Reported-by: mowang da <whooya.xxl@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: initialize root dentrySage Weil2011-11-111-2/+4
| | | | | | | | | | | Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference in ceph_d_prune on umount. It also means we can eventually strip out all of the conditional checks on d_fsdata because it is now set unconditionally (prior to setting up the d_ops). Fix the ceph_d_prune debug print while we're here. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph/super.c: quiet sparse noiseH Hartley Sweeten2011-11-051-2/+2
| | | | | | | | | | | | Quiet the sparse noise: warning: symbol 'create_fs_client' was not declared. Should it be static? warning: symbol 'destroy_fs_client' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: replace leading spaces with tabsNoah Watkins2011-10-251-20/+20
| | | | | | | Trivial formatting fix. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: create messenger with clientSage Weil2011-10-251-3/+6
| | | | | | | This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: rename rsize -> rasizeSage Weil2011-10-251-3/+11
| | | | | | It controls readahead. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix memory leakNoah Watkins2011-08-221-2/+2
| | | | | | | | kfree does not clean up indirect allocations in ceph_fs_client and ceph_options (e.g. snapdir_name). Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: set up readahead size when rsize is not passedYehuda Sadeh2011-07-261-0/+4
| | | | | | | This should improve the default read performance, as without it readahead is practically disabled. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: report f_bfree based on kb_avail rather than diffing.Greg Farnum2011-07-261-2/+1
| | | | | Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
* ceph: Move secret key parsing earlier.Tommi Virtanen2011-03-291-1/+1
| | | | | | | | | This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: move readahead default to fs/ceph from libcephSage Weil2011-03-211-2/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add ino32 mount optionYehuda Sadeh2011-03-211-0/+5
| | | | | | | The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
* ceph: fix cap_wanted_delay_{min,max} mount option initializationSage Weil2011-01-191-0/+2
| | | | | | | These were initialized to 0 instead of the default, fallout from the RBD refactor in 3d14c5d2b6e15c21d8e5467dc62d33127c23a644. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fsc->*_wq's aren't used in memory reclaim pathTejun Heo2011-01-121-3/+7
| | | | | | | | | | fsc->*_wq's aren't depended upon during memory reclaim. Convert to alloc_workqueue() w/o WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: implement DIRLAYOUTHASH feature to get dir layout from MDSSage Weil2011-01-121-1/+2
| | | | | | | | | | | | | | | This implements the DIRLAYOUTHASH protocol feature, which passes the dir layout over the wire from the MDS. This gives the client knowledge of the correct hash function to use for mapping dentries among dir fragments. Note that if this feature is _not_ present on the client but is on the MDS, the client may misdirect requests. This will result in a forward and degrade performance. It may also result in inaccurate NFS filehandle generation, which will prevent fh resolution when the inode is not present in the client cache and the parent directories have been fragmented. Signed-off-by: Sage Weil <sage@newdream.net>
* convert cephAl Viro2010-10-291-23/+27
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: factor out libceph from Ceph file systemYehuda Sadeh2010-10-201-682/+472
| | | | | | | | | | | | | | | | | | | | | | This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not ignore osd_idle_ttl mount optionSage Weil2010-08-031-0/+3
| | | | | | Actually apply the mount option to the mount_args struct. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: make ->sync_fs not wait if wait==0Sage Weil2010-08-011-4/+13
| | | | | | | The ->sync_fs() super op only needs to wait if wait is true. Otherwise, just get some dirty cap writeback started. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use %pU to print uuid (fsid)Sage Weil2010-08-011-6/+6
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clean up fsid mount optionSage Weil2010-08-011-13/+39
| | | | | | | Specify the fsid mount option in hex, not via the major/minor u64 hackery we had before. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove unused 'monport' mount optionSage Weil2010-08-011-2/+0
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do caps accounting per mds_clientYehuda Sadeh2010-08-011-6/+0
| | | | | | | | | Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix atomic64_t initialization on ia64Jeff Mahoney2010-06-101-1/+1
| | | | | | | | bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix f_namelen reported by statfsSage Weil2010-06-011-1/+1
| | | | | | | | | We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2010-05-301-2/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: clean up on forwarded aborted mds request ceph: fix leak of osd authorizer ceph: close out mds, osd connections before stopping auth ceph: make lease code DN specific fs/ceph: Use ERR_CAST ceph: renew auth tickets before they expire ceph: do not resend mon requests on auth ticket renewal ceph: removed duplicated #includes ceph: avoid possible null dereference ceph: make mds requests killable, not interruptible sched: add wait_for_completion_killable_timeout
| * ceph: close out mds, osd connections before stopping authSage Weil2010-05-291-1/+9
| | | | | | | | | | | | | | | | | | The auth module (part of the mon_client) is needed to free any ceph_authorizer(s) used by the mds and osd connections. Flush the msgr workqueue before stopping monc to ensure that the destroy_authorizer auth op is available when those connections are closed out. Signed-off-by: Sage Weil <sage@newdream.net>
| * fs/ceph: Use ERR_CASTJulia Lawall2010-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-241-44/+81
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits) ceph: reuse mon subscribe message instead of allocated anew ceph: avoid resending queued message to monitor ceph: Storage class should be before const qualifier ceph: all allocation functions should get gfp_mask ceph: specify max_bytes on readdir replies ceph: cleanup pool op strings ceph: Use kzalloc ceph: use common helper for aborted dir request invalidation ceph: cope with out of order (unsafe after safe) mds reply ceph: save peer feature bits in connection structure ceph: resync headers with userland ceph: use ceph. prefix for virtual xattrs ceph: throw out dirty caps metadata, data on session teardown ceph: attempt mds reconnect if mds closes our session ceph: clean up send_mds_reconnect interface ceph: wait for mds OPEN reply to indicate reconnect success ceph: only send cap releases when mds is OPEN|HUNG ceph: dicard cap releases on mds restart ceph: make mon client statfs handling more generic ceph: drop src address(es) from message header [new protocol feature] ...
| * ceph: specify max_bytes on readdir repliesSage Weil2010-05-171-0/+8
| | | | | | | | | | | | | | Specify max bytes in request to bound size of reply. Add associated mount option with default value of 512 KB. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: name bdi ceph-%d instead of major:minorSage Weil2010-05-171-1/+4
| | | | | | | | | | | | | | The bdi_setup_and_register() helper doesn't help us since we bdi_init() in create_client() and bdi_register() only when sget() succeeds. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: clean up mount options, ->show_options()Sage Weil2010-05-171-31/+59
| | | | | | | | | | | | Ensure all options are included in /proc/mounts. Some cleanup. Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: remove unused #includesHuang Weiyi2010-05-171-3/+0
| | | | | | | | | | | | | | | | Remove unused #include's in fs/ceph/super.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: wait for both monmap and osdmap when opening sessionSage Weil2010-05-171-5/+6
| | | | | | | | Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * ceph: use ceph_sb_to_client instead of ceph_clientCheng Renquan2010-05-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_sb_to_client and ceph_client are really identical, we need to dump one; while function ceph_client is confusing with "struct ceph_client", ceph_sb_to_client's definition is more clear; so we'd better switch all call to ceph_sb_to_client. -static inline struct ceph_client *ceph_client(struct super_block *sb) -{ - return sb->s_fs_info; -} Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: should use deactivate_locked_super() on failure exitsAl Viro2010-05-211-2/+1
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ceph: unregister bdi before kill_anon_super releases device nameSage Weil2010-05-041-7/+16
| | | | | | | | | | | | Unregister and destroy the bdi in put_super, after mount is r/o, but before put_anon_super releases the device name. For symmetry, bdi_destroy in destroy_client (we bdi_init in create_client). Only set s_bdi if bdi_register succeeds, since we use it to decide whether to bdi_unregister. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: print more useful version info on module loadSage Weil2010-05-031-3/+4
| | | | | | | Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: Sage Weil <sage@newdream.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* ceph: reset osd after relevant messages timed outYehuda Sadeh2010-03-041-1/+7
| | | | | | | | | | | | | | | | This simplifies the process of timing out messages. We keep lru of current messages that are in flight. If a timeout has passed, we reset the osd connection, so that messages will be retransmitted. This is a failsafe in case we hit some sort of problem sending out message to the OSD. Normally, we'll get notification via an updated osdmap if there are problems. If a request is older than the keepalive timeout, send a keepalive to ensure we detect any breaks in the TCP connection. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clean up readdir caps reservationSage Weil2010-02-171-0/+5
| | | | | | | | | Use a global counter for the minimum number of allocated caps instead of hard coding a check against readdir_max. This takes into account multiple client instances, and avoids examining the superblock mount options when a cap is dropped. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: put unused osd connections on lruYehuda Sadeh2010-02-111-0/+3
| | | | | | | | | | Instead of removing osd connection immediately when the requests list is empty, put the osd connection on an lru. Only if that osd has not been used for more than a specified time, will it be removed. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: allow renewal of auth credentialsSage Weil2010-02-101-6/+6
| | | | | | | | | Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only unregister registered bdiSage Weil2009-12-231-1/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: writeback congestion controlYehuda Sadeh2009-12-211-0/+36
| | | | | | | | Set bdi congestion bit when amount of write data in flight exceeds adjustable threshold. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud