summaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAgeFilesLines
* ceph: correctly set 'follows' in flushsnap messagesSage Weil2010-09-141-1/+1
| | | | | | | | | The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix dn offset during readdir_prepopulateSage Weil2010-09-131-5/+6
| | | | | | | | | | | When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935bedccf592d44343890251452a6dd74fc4. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix file offset wrapping at 4GB on 32-bit archsSage Weil2010-09-111-1/+2
| | | | | | | | | Cast the value before shifting so that we don't run out of bits with a 32-bit unsigned long. This fixes wrapping of high file offsets into the low 4GB of a file on disk, and the subsequent data corruption for large files. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix reconnect encoding for old serversSage Weil2010-09-111-0/+2
| | | | | | | Fix the reconnect encoding to encode the cap record when the MDS does not have the FLOCK capability (i.e., pre v0.22). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix pagelist kunmap tailYehuda Sadeh2010-09-111-2/+10
| | | | | | | A wrong parameter was passed to the kunmap. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix null pointer deref on anon root dentry releaseSage Weil2010-09-111-3/+7
| | | | | | | | | | | | | | | | | | When we release a root dentry, particularly after a splice, the parent (actually our) inode was evaluating to NULL and was getting dereferenced by ceph_snap(). This is reproduced by something as simple as mount -t ceph monhost:/a/b mnt mount -t ceph monhost:/a mnt2 ls mnt2 A splice_dentry() would kill the old 'b' inode's root dentry, and we'd crash while releasing it. Fix by checking for both the ROOT and NULL cases explicitly. We only need to invalidate the parent dir when we have a correct parent to invalidate. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix get_ticket_handler() error handlingDan Carpenter2010-08-261-6/+9
| | | | | | | | get_ticket_handler() returns a valid pointer or it returns ERR_PTR(-ENOMEM) if kzalloc() fails. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: don't BUG on ENOMEM during mds reconnectSage Weil2010-08-261-3/+4
| | | | | | We are in a position to return an error; do that instead. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: ceph_mdsc_build_path() returns an ERR_PTRDan Carpenter2010-08-261-0/+4
| | | | | | | | ceph_mdsc_build_path() returns an ERR_PTR but this code is set up to handle NULL returns. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: Fix warningsAlan Cox2010-08-251-5/+9
| | | | | | | | | | | Just scrubbing some warnings so I can see real problem ones in the build noise. For 32bit we need to coax gcc politely into believing we really honestly intend to the casts. Using (u64)(unsigned long) means we cast from a pointer to a type of the right size and then extend it. This stops the warning spew. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: ceph_get_inode() returns an ERR_PTRDan Carpenter2010-08-251-2/+2
| | | | | | | ceph_get_inode() returns an ERR_PTR and it doesn't return a NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: initialize fields on new dentry_infosSage Weil2010-08-241-1/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: maintain i_head_snapc when any caps are dirty, not just for dataSage Weil2010-08-244-7/+26
| | | | | | | | | | | | We used to use i_head_snapc to keep track of which snapc the current epoch of dirty data was dirtied under. It is used by queue_cap_snap to set up the cap_snap. However, since we queue cap snaps for any dirty caps, not just for dirty file data, we need to keep a valid i_head_snapc anytime we have dirty|flushing caps. This fixes a NULL pointer deref in queue_cap_snap when writing back dirty caps without data (e.g., snaptest-authwb.sh). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix osd request lru adjustment when sending requestHenry C Chang2010-08-221-1/+1
| | | | | | | | Fix argument order. We want to move the item to the end of the list, not change the position of the head. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: don't improperly set dir complete when holding EXCL capSage Weil2010-08-221-0/+1
| | | | | | | | | If we hold the EXCL cap, we cannot trust the dir stats from the MDS (num files, subdirs) and must not incorrectly conclude that the directory is empty. If we do, we get can bad results from lookup (bad ENOENT) and bad readdir results. Signed-off-by: Sage Weil <sage@newdream.net>
* mm: exporting account_page_dirtyMichael Rubin2010-08-221-7/+1
| | | | | | | | | | This allows code outside of the mm core to safely manipulate page state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. This has happened once already. Signed-off-by: Michael Rubin <mrubin@google.com> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: direct requests in snapped namespace based on nonsnap parentSage Weil2010-08-221-2/+24
| | | | | | | | | | When making a request in the virtual snapdir or a snapped portion of the namespace, we should choose the MDS based on the first nonsnap parent (and its caps). If that is not the best place, we will get forward hints to find the right MDS in the cluster. This fixes ESTALE errors when using the .snap directory and namespace with multiple MDSs. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: queue cap snap writeback for realm children on snap updateSage Weil2010-08-221-23/+37
| | | | | | | | When a realm is updated, we need to queue writeback on inodes in that realm _and_ its children. Otherwise, if the inode gets cowed on the server, we can get a hang later due to out-of-sync cap/snap state. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: include dirty xattrs state in snapped capsSage Weil2010-08-224-11/+23
| | | | | | | | | | | | When we snapshot dirty metadata that needs to be written back to the MDS, include dirty xattr metadata. Make the capsnap reference the encoded xattr blob so that it will be written back in the FLUSHSNAP op. Also fix the capsnap creation guard to include dirty auth or file bits, not just tests specific to dirty file data or file writes in progress (this fixes auth metadata writeback). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix xattr cap writebackSage Weil2010-08-221-5/+5
| | | | | | | | | | We should include the xattr metadata blob in the cap update message any time we are flushing dirty state, NOT just when we are also dropping the cap. This fixes async xattr writeback. Also, clean up the code slightly to avoid duplicating the bit test. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix multiple mds session shutdownSage Weil2010-08-222-34/+37
| | | | | | | | | | | | The use of a completion when waiting for session shutdown during umount is inappropriate, given the complexity of the condition. For multiple MDS's, this resulted in the umount thread spinning, often preventing the session close message from being processed in some cases. Switch to a waitqueue and defined a condition helper. This cleans things up nicely. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: generalize mon requests, add pool op supportYehuda Sadeh2010-08-102-17/+158
| | | | | | | Generalize the current statfs synchronous requests, and support pool_ops. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only queue async writeback on cap revocation if there is dirty dataSage Weil2010-08-051-1/+1
| | | | | | | | | | | | Normally, if the Fb cap bit is being revoked, we queue an async writeback. If there is no dirty data but we still hold the cap, this leaves the client sitting around doing nothing until the cap timeouts expire and the cap is released on its own (as it would have been without the revocation). Instead, only queue writeback if the bit is actually used (i.e., we have dirty data). If not, we can reply to the revocation immediately. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do not ignore osd_idle_ttl mount optionSage Weil2010-08-031-0/+3
| | | | | | Actually apply the mount option to the mount_args struct. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: constify dentry_operationsSage Weil2010-08-032-5/+5
| | | | | | This makes checkpatch happy. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: whitespace cleanupSage Weil2010-08-037-24/+31
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add flock/fcntl lock supportGreg Farnum2010-08-025-2/+284
| | | | | | | | | | Implement flock inode operation to support advisory file locking. All lock/unlock operations are synchronous with the MDS. Lock state is sent when reconnecting to a recovering MDS to restore the shared lock state. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: define on-wire types, constants for file locking supportGreg Farnum2010-08-022-2/+36
| | | | | | | | Define the MDS operations and data types for doing file advisory locking with the MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add CEPH_FEATURE_FLOCK to the supported feature bitsGreg Farnum2010-08-021-1/+1
| | | | | | | | This informs the server that we will accept v2 client_caps format and v2 client_reconnect format messages. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: support v2 reconnect encodingSage Weil2010-08-022-13/+50
| | | | | | | Encode either old or v2 encoding of client_reconnect message, depending on whether the peer has the FLOCK feature bit. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: support v2 client_caps encodingSage Weil2010-08-021-2/+19
| | | | | | Add support for v2 encoding of MClientCaps, which includes a flock blob. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: move AES iv definition to shared headerSage Weil2010-08-022-1/+3
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix decoding of pool snap infoSage Weil2010-08-021-4/+26
| | | | | | | | The pool info contains a vector for snap_info_t, not snap ids. This fixes the broken decoding, which would declare teh update corrupt when a pool snapshot was created. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: make ->sync_fs not wait if wait==0Sage Weil2010-08-011-4/+13
| | | | | | | The ->sync_fs() super op only needs to wait if wait is true. Otherwise, just get some dirty cap writeback started. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: warn on missing snap realmSage Weil2010-08-011-0/+1
| | | | | | | Well, this Shouldn't Happen, so it would be helpful to know the caller when it does. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: print useful error message when crush rule not foundSage Weil2010-08-011-2/+3
| | | | | | Include the crush_ruleset in the error message. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: use %pU to print uuid (fsid)Sage Weil2010-08-013-15/+8
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: sync header defs with server codeSage Weil2010-08-013-0/+11
| | | | | | Define ROLLBACK op, IFLOCK inode lock (for advisory file locking). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clean up header guardsSage Weil2010-08-018-16/+16
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: strip misleading/obsolete version, feature infoSage Weil2010-08-011-26/+4
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: specify supported features in super.hSage Weil2010-08-012-3/+9
| | | | | | | | Specify the supported/required feature bits in super.h client code instead of using the definitions from the shared kernel/userspace headers (which will go away shortly). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clean up fsid mount optionSage Weil2010-08-011-13/+39
| | | | | | | Specify the fsid mount option in hex, not via the major/minor u64 hackery we had before. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove unused 'monport' mount optionSage Weil2010-08-011-2/+0
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: handle ESTALE properly; on receipt send to authority if it wasn'tGreg Farnum2010-08-012-8/+35
| | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add ceph_get_cap_for_mds function.Greg Farnum2010-08-012-0/+12
| | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: connect to export targets on cap exportSage Weil2010-08-013-2/+23
| | | | | | | When we get a cap EXPORT message, make sure we are connected to all export targets to ensure we can handle the matching IMPORT. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: connect to export targets if mds is laggySage Weil2010-08-011-0/+15
| | | | | | | | If an MDS we are talking to may have failed, we need to open sessions to its potential export targets to ensure that any in-progress migration that may have involved some of our caps is properly handled. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: introduce helper to connect to mds export targetsSage Weil2010-08-011-0/+37
| | | | | | | There are a few cases where we need to open sessions with a given mds's potential export targets. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: only set num_pages in calc_layoutSage Weil2010-08-011-3/+0
| | | | | | Setting it elsewhere is unnecessary and more fragile. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: do caps accounting per mds_clientYehuda Sadeh2010-08-015-115/+131
| | | | | | | | | Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud