summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ceph: fix cap removal racesSage Weil2010-05-112-9/+15
| | | | | | | | | | | | | | The iterate_session_caps helper traverses the session caps list and tries to grab an inode reference. However, the __ceph_remove_cap was clearing the inode backpointer _before_ removing itself from the session list, causing a null pointer dereference. Clear cap->ci under protection of s_cap_lock to avoid the race, and to tightly couple the list and backpointer state. Use a local flag to indicate whether we are releasing the cap, as cap->session may be modified by a racing thread in iterate_session_caps. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: zero unused message header, footer fieldsSage Weil2010-05-111-1/+5
| | | | | | | | We shouldn't leak any prior memory contents to other parties. And random data, particularly in the 'version' field, can cause problems down the line. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix locking for waking session requests after reconnectSage Weil2010-05-111-13/+16
| | | | | | | | | | The session->s_waiting list is protected by mdsc->mutex, not s_mutex. This was causing (rare) s_waiting list corruption. Fix errors paths too, while we're here. A more thorough cleanup of this function is coming soon. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: resubmit requests on pg mapping change (not just primary change)Sage Weil2010-05-115-9/+44
| | | | | | | | OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix open file counting on snapped inodes when mds returns no capsSage Weil2010-05-111-0/+4
| | | | | | | | | It's possible the MDS will not issue caps on a snapped inode, in which case an open request may not __ceph_get_fmode(), botching the open file counting. (This is actually a server bug, but the client shouldn't BUG out in this case.) Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: unregister osd request on failureSage Weil2010-05-111-2/+5
| | | | | | | | | The osd request wasn't being unregistered when the osd returned a failure code, even though the result was returned to the caller. This would cause it to eventually time out, and then crash the kernel when it tried to resend the request using a stale page vector. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: don't use writeback_control in writepages completionSage Weil2010-05-052-7/+0
| | | | | | | | | | | The ->writepages writeback_control is not still valid in the writepages completion. We were touching it solely to adjust pages_skipped when there was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials), causing an oops in the writeback code shortly thereafter. Updating pages_skipped on error isn't correct anyway, so let's just rip out this (clearly broken) code to pass the wbc to the completion. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: unregister bdi before kill_anon_super releases device nameSage Weil2010-05-041-7/+16
| | | | | | | | | | | | Unregister and destroy the bdi in put_super, after mount is r/o, but before put_anon_super releases the device name. For symmetry, bdi_destroy in destroy_client (we bdi_init in create_client). Only set s_bdi if bdi_register succeeds, since we use it to decide whether to bdi_unregister. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: remove bad auth_x kmem_cacheSage Weil2010-05-031-22/+10
| | | | | | | | It's useless, since our allocations are already a power of 2. And it was allocated per-instance (not globally), which caused a name collision when we tried to mount a second file system with auth_x enabled. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix lockless caps checkSage Weil2010-05-031-1/+1
| | | | | | The __ variant requires caller to hold i_lock. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clear dir complete, invalidate dentry on replayed renameSage Weil2010-05-031-0/+9
| | | | | | | | | | | | If a rename operation is resent to the MDS following an MDS restart, the client does not get a full reply (containing the resulting metadata) back. In that case, a ceph_rename() needs to compensate by doing anything useful that fill_inode() would have, like d_move(). It also needs to invalidate the dentry (to workaround the vfs_rename_dir() bug) and clear the dir complete flag, just like fill_trace(). Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix direct io truncate offsetSage Weil2010-05-031-1/+2
| | | | | | | truncate_inode_pages_range wants the end offset to align with the last byte in a page. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: discard incoming messages with bad seq #Sage Weil2010-05-031-0/+20
| | | | | | | | We can get old message seq #'s after a tcp reconnect for stateful sessions (i.e., the MDS). If we get a higher seq #, that is an error, and we shouldn't see any bad seq #'s for stateless (mon, osd) connections. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix seq counting for skipped messagesSage Weil2010-05-031-0/+2
| | | | | | Increment in_seq even when the message is skipped for some reason. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: add missing #includesSage Weil2010-05-033-0/+4
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix leaked spinlock during mds reconnectSage Weil2010-05-031-1/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: print more useful version info on module loadSage Weil2010-05-031-3/+4
| | | | | | | Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix snap realm splitsSage Weil2010-05-031-10/+14
| | | | | | | | | | | | The snap realm split was checking i_snap_realm, not the list_head, to determine if an inode belonged in the new realm. The check always failed, which meant we always moved the inode, corrupting the old realm's list and causing various crashes. Also wait to release old realm reference to avoid possibility of use after free. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: clear dir complete on d_moveSage Weil2010-05-031-0/+4
| | | | | | | | d_move() reorders the d_subdirs list, breaking the readdir result caching. Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on rename. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-04-296-9/+120
|\ | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: add a shrinker to background inode reclaim
| * xfs: add a shrinker to background inode reclaimDave Chinner2010-04-296-9/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On low memory boxes or those with highmem, kernel can OOM before the background reclaims inodes via xfssyncd. Add a shrinker to run inode reclaim so that it inode reclaim is expedited when memory is low. This is more complex than it needs to be because the VM folk don't want a context added to the shrinker infrastructure. Hence we need to add a global list of XFS mount structures so the shrinker can traverse them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-04-291-0/+1
|\ \ | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: exofs: Fix "add bdi backing to mount session" fall out fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK
| * | exofs: Fix "add bdi backing to mount session" fall outBoaz Harrosh2010-04-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch: add bdi backing to mount session (b3d0ab7e60d1865bb6f6a79a77aaba22f2543236) Has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCKJens Axboe2010-04-291-0/+1
| |/ | | | | | | | | | | | | | | | | When CONFIG_BLOCK is set, it ends up getting backing-dev.h included. But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is include <linux/backing-dev.h> directly from the file it's used from, so do that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-04-295-22/+45
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4 nfs: fix some issues in nfs41_proc_reclaim_complete() NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear NFS: Fix an unstable write data integrity race nfs: testing for null instead of ERR_PTR() NFS: rsize and wsize settings ignored on v4 mounts NFSv4: Don't attempt an atomic open if the file is a mountpoint SUNRPC: Fix a bug in rpcauth_prune_expired
| * | nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4Xiaotian Feng2010-04-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_NFS_V4 and data version 4, nfs_get_sb will allocate memory for export_path in nfs4_validate_text_mount_data, so we need to free it then. This is addressed in following kmemleak report: unreferenced object 0xffff88016bf48a50 (size 16): comm "mount.nfs", pid 22567, jiffies 4651574704 (age 175471.200s) hex dump (first 16 bytes): 2f 6f 70 74 2f 77 6f 72 6b 00 6b 6b 6b 6b 6b a5 /opt/work.kkkkk. backtrace: [<ffffffff814b34f9>] kmemleak_alloc+0x60/0xa7 [<ffffffff81102c76>] kmemleak_alloc_recursive.clone.5+0x1b/0x1d [<ffffffff811046b3>] __kmalloc_track_caller+0x18f/0x1b7 [<ffffffff810e1b08>] kstrndup+0x37/0x54 [<ffffffffa0336971>] nfs_parse_devname+0x152/0x204 [nfs] [<ffffffffa0336af3>] nfs4_validate_text_mount_data+0xd0/0xdc [nfs] [<ffffffffa0338deb>] nfs_get_sb+0x325/0x736 [nfs] [<ffffffff81113671>] vfs_kern_mount+0xbd/0x17c [<ffffffff81113798>] do_kern_mount+0x4d/0xed [<ffffffff81129a87>] do_mount+0x787/0x7fe [<ffffffff81129b86>] sys_mount+0x88/0xc2 [<ffffffff81009b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | nfs: fix some issues in nfs41_proc_reclaim_complete()Dan Carpenter2010-04-281-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | The original code passed an ERR_PTR() to rpc_put_task() and instead of returning zero on success it returned -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clearTrond Myklebust2010-04-271-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in nfs_page_async_flush. According to the trace in https://bugzilla.novell.com/show_bug.cgi?id=599628 the problem appears to be due to nfs_wb_page() not waiting for the PG_writeback flag to clear. There is a ditto problem in nfs_wb_page_cancel() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: Fix an unstable write data integrity raceTrond Myklebust2010-04-221-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2c61be0a9478258f77b66208a0c4b1f5f8161c3c (NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible) exposed a race on file close. In order to ensure correct close-to-open behaviour, we want to wait for all outstanding background commit operations to complete. This patch adds an inode flag that indicates if a commit operation is under way, and provides a mechanism to allow ->write_inode() to wait for its completion if this is a data integrity flush. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | nfs: testing for null instead of ERR_PTR()Dan Carpenter2010-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | nfs_path() returns an ERR_PTR(), it doesn't return null. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFS: rsize and wsize settings ignored on v4 mountsChuck Lever2010-04-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4 mounts ignore the rsize and wsize mount options, and always use the default transfer size for both. This seems to be because all NFSv4 mounts are now cloned, and the cloning logic doesn't copy the rsize and wsize settings from the parent nfs_server. I tested Fedora's 2.6.32.11-99 and it seems to have this problem as well, so I'm guessing that .33, .32, and perhaps older kernels have this issue as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Stable <stable@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | NFSv4: Don't attempt an atomic open if the file is a mountpointTrond Myklebust2010-04-221-1/+1
| | | | | | | | | | | | | | | | | | Fix https://bugzilla.kernel.org/show_bug.cgi?id=15789 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | pktcdvd: improve BKL and compat_ioctl.c usageArnd Bergmann2010-04-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pktcdvd driver uses proper locking and does not need the BKL in the ioctl and llseek functions of the character device, so kill both. Moving the compat_ioctl handling from common code into the driver itself fixes build problems when CONFIG_BLOCK is disabled. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | exofs: Fix "add bdi backing to mount session" fall outBoaz Harrosh2010-04-291-1/+1
| |/ |/| | | | | | | | | | | | | | | | | Commit b3d0ab7e60d1865bb6f6a79a77aaba22f2543236 ("exofs: add bdi backing to mount session") has a bug in the placement of the bdi member at struct exofs_sb_info. The layout member must be kept last. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | nfs d_revalidate() is too trigger-happy with d_drop()Al Viro2010-04-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If dentry found stale happens to be a root of disconnected tree, we can't d_drop() it; its d_hash is actually part of s_anon and d_drop() would simply hide it from shrink_dcache_for_umount(), leading to all sorts of fun, including busy inodes on umount and oopsen after that. Bug had been there since at least 2006 (commit c636eb already has it), so it's definitely -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-04-2819-16/+90
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: coda: move backing-dev.h kernel include inside __KERNEL__ mtd: ensure that bdi entries are properly initialized and registered Move mtd_bdi_*mappable to mtdcore.c btrfs: convert to using bdi_setup_and_register() Catch filesystems lacking s_bdi drbd: Terminate a connection early if sending the protocol fails drbd: fix memory leak Fix JFFS2 sync silent failure smbfs: add bdi backing to mount session ncpfs: add bdi backing to mount session exofs: add bdi backing to mount session ecryptfs: add bdi backing to mount session coda: add bdi backing to mount session cifs: add bdi backing to mount session afs: add bdi backing to mount session. 9p: add bdi backing to mount session bdi: add helper function for doing init and register of a bdi for a file system block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer
| * | btrfs: convert to using bdi_setup_and_register()Jens Axboe2010-04-261-11/+1
| | | | | | | | | | | | | | | | | | | | | It's now a provided helper, so get rid of the internal setup and btrfs atomic_t bdi enumerator. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | Catch filesystems lacking s_bdiJörn Engel2010-04-252-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | noop_backing_dev_info is used only as a flag to mark filesystems that don't have any backing store, like tmpfs, procfs, spufs, etc. Signed-off-by: Joern Engel <joern@logfs.org> Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes to the noop_backing_dev_info is not legal and will not result in them being flushed, but we already catch this condition in __mark_inode_dirty() when checking for a registered bdi. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | smbfs: add bdi backing to mount sessionJens Axboe2010-04-221-0/+8
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | ncpfs: add bdi backing to mount sessionJens Axboe2010-04-221-0/+8
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | exofs: add bdi backing to mount sessionJens Axboe2010-04-222-0/+10
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | ecryptfs: add bdi backing to mount sessionJens Axboe2010-04-223-1/+12
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | coda: add bdi backing to mount sessionJens Axboe2010-04-221-0/+8
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | cifs: add bdi backing to mount sessionJens Axboe2010-04-222-0/+13
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | afs: add bdi backing to mount session.Jens Axboe2010-04-223-0/+10
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * | 9p: add bdi backing to mount sessionJens Axboe2010-04-223-0/+13
| | | | | | | | | | | | | | | | | | This ensures that dirty data gets flushed properly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2010-04-271-4/+4
|\ \ \ | | | | | | | | | | | | | | | | * 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: nfsd4: bug in read_buf
| * | | nfsd4: bug in read_bufNeil Brown2010-04-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When read_buf is called to move over to the next page in the pagelist of an NFSv4 request, it sets argp->end to essentially a random number, certainly not an address within the page which argp->p now points to. So subsequent calls to READ_BUF will think there is much more than a page of spare space (the cast to u32 ensures an unsigned comparison) so we can expect to fall off the end of the second page. We never encountered thsi in testing because typically the only operations which use more than two pages are write-like operations, which have their own decoding logic. Something like a getattr after a write may cross a page boundary, but it would be very unusual for it to cross another boundary after that. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* | | | procfs: fix tid fdinfoJerome Marchand2010-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the file_operations struct in fdinfo entry of tid_base_stuff[]. Presently /proc/*/task/*/fdinfo contains symlinks to opened files like /proc/*/fd/. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Remove redundant check for CONFIG_MMUChristoph Egger2010-04-271-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The checks for CONFIG_MMU at this location are duplicated as all the code is located inside a #ifndef CONFIG_MMU block. So the first conditional block will always be included while the second never will. Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud