op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	helper for reading ->d_count	Al Viro	2013-07-05	1	-1/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	] nilfs2: use atomic64_t type for inodes_count and blocks_count fields in ↵	Vyacheslav Dubeyko	2013-07-03	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nilfs_root struct The cp_inodes_count and cp_blocks_count are represented as __le64 type in on-disk structure (struct nilfs_checkpoint). But analogous fields in in-core structure (struct nilfs_root) are represented by atomic_t type. This patch replaces atomic_t on atomic64_t type in representation of inodes_count and blocks_count fields in struct nilfs_root. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Joern Engel <joern@logfs.org> Cc: Clemens Eisserer <linuxhippy@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: implement calculation of free inodes count	Vyacheslav Dubeyko	2013-07-03	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, NILFS2 returns 0 as free inodes count (f_ffree) and current used inodes count as total file nodes in file system (f_files): df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/loop0 2 2 0 100% /mnt/nilfs2 This patch implements real calculation of free inodes count. First of all, it is calculated total file nodes in file system as (desc_blocks_count * groups_per_desc_block * entries_per_group). Then, it is calculated free inodes count as difference the total file nodes and used inodes count. As a result, we have such output for NILFS2: df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/loop0 4194304 2114701 2079603 51% /mnt/nilfs2 Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fs: Limit sys_mount to only request filesystem modules.	Eric W. Biederman	2013-03-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
*	fs: push rcu_barrier() from deactivate_locked_super() to filesystems	Kirill A. Shutemov	2012-10-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	nilfs2: nuke write_super from comments	Artem Bityutskiy	2012-08-04	1	-4/+0
\| \| \| \| \| \| \| \| \|	The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from ntfs. Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	nilfs2: fix deadlock issue between chcp and thaw ioctls	Ryusuke Konishi	2012-07-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: remove references to long gone super operations	Fernando Luis Vazquez Cao	2012-07-30	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	->delete_inode(), ->write_super_lockfs(), ->unlockfs() are gone so remove references to them in the NTFS code. Noticed while cleaning up the fsfreeze mess. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	VFS: Pass mount flags to sget()	David Howells	2012-07-14	1	-2/+2
\| \| \| \| \| \| \| \| \|	Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	switch open-coded instances of d_make_root() to new helper	Al Viro	2012-03-20	1	-2/+1
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}	Al Viro	2012-03-20	1	-0/+1
\| \| \| \| \| \| \| \| \|	New field of struct super_block - ->s_max_links. Maximal allowed value of ->i_nlink or 0; in the latter case all checks still need to be done in ->link/->mkdir/->rename instances. Note that this limit applies both to directoris and to non-directories. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: switch ->show_options() to struct dentry *	Al Viro	2012-01-06	1	-3/+3
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	vfs: fix the stupidity with i_dentry in inode destructors	Al Viro	2012-01-03	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	nilfs2: always set back pointer to host inode in mapping->host	Ryusuke Konishi	2011-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In the current nilfs, page cache for btree nodes and meta data files do not set a valid back pointer to the host inode in mapping->host. This will change it so that every address space in nilfs uses mapping->host to hold its host inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: implement resize ioctl	Ryusuke Konishi	2011-05-10	1	-0/+72
\| \| \| \| \| \|	This adds resize ioctl which makes online resize possible. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: add routine to move secondary super block	Ryusuke Konishi	2011-05-10	1	-0/+57
\| \| \| \| \| \| \|	After resizing the filesystem, the secondary super block must be moved to a new location. This adds a helper function for this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: get rid of nilfs_sb_info structure	Ryusuke Konishi	2011-03-09	1	-40/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This directly uses sb->s_fs_info to keep a nilfs filesystem object and fully removes the intermediate nilfs_sb_info structure. With this change, the hierarchy of on-memory structures of nilfs will be simplified as follows: Before: super_block -> nilfs_sb_info -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) After: super_block -> the_nilfs -> cptree --+-> nilfs_root (current file system) +-> nilfs_root (snapshot A) +-> nilfs_root (snapshot B) : -> nilfs_sc_info (log writer structure) The reason why we didn't design so from the beginning is because the initial shape also differed from the above. The early hierachy was composed of "per-mount-point" super_block -> nilfs_sb_info pairs and a shared nilfs object. On the kernel 2.6.37, it was changed to the current shape in order to unify super block instances into one per device, and this cleanup became applicable as the result. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: use sb instance instead of nilfs_sb_info struct	Ryusuke Konishi	2011-03-09	1	-54/+51
\| \| \| \| \| \|	This replaces sbi uses with direct reference to sb instance. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: move next generation counter into nilfs object	Ryusuke Konishi	2011-03-09	1	-11/+0
\| \| \| \| \| \| \|	Moves s_next_generation counter and a spinlock protecting it to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: move s_inode_lock and s_dirty_files into nilfs object	Ryusuke Konishi	2011-03-09	1	-3/+0
\| \| \| \| \| \| \|	Moves s_inode_lock spinlock and s_dirty_files list to nilfs object from nilfs_sb_info structure. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: move parameters on nilfs_sb_info into nilfs object	Ryusuke Konishi	2011-03-09	1	-5/+5
\| \| \| \| \| \| \|	This moves four parameter variables on nilfs_sb_info s_resuid, s_resgid, s_interval and s_watermark to the nilfs object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	nilfs2: move mount options to nilfs object	Ryusuke Konishi	2011-03-09	1	-27/+29
\| \| \| \| \| \| \|	This moves mount_opt local variable to nilfs object from nilfs_sb_info struct. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
*	mm: prevent concurrent unmap_mapping_range() on the same inode	Miklos Szeredi	2011-02-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: fix crash after one superblock became unavailable	Ryusuke Konishi	2011-01-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the following kernel oops in nilfs_setup_super() which could arise if one of two super-blocks is unavailable. > BUG: unable to handle kernel NULL pointer dereference at (null) > Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 / > EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3 > EIP is at memcpy+0xc/0x1b > Call Trace: > [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2] > [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2] > [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2] > [<c02745cf>] ? kstrdup+0x36/0x3f > [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2] > [<c0293940>] ? vfs_kern_mount+0x4d/0x12c > [<c02a5100>] ? get_fs_type+0x76/0x8f > [<c0293a68>] ? do_kern_mount+0x33/0xbf > [<c02a784a>] ? do_mount+0x2ed/0x714 > [<c02a6171>] ? copy_mount_options+0x28/0xfc > [<c02a7ce3>] ? sys_mount+0x72/0xaf > [<c0473085>] ? syscall_call+0x7/0xb Reported-by: Wakko Warner <wakko@animx.eu.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Wakko Warner <wakko@animx.eu.org> Cc: stable <stable@kernel.org> [2.6.37, 2.6.36] LKML-Reference: <20110121024918.GA29598@animx.eu.org>
*	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block	Linus Torvalds	2011-01-13	1	-4/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...
\| *	block: clean up blkdev_get() wrappers and their users	Tejun Heo	2010-11-13	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After recent blkdev_get() modifications, open_by_devnum() and open_bdev_exclusive() are simple wrappers around blkdev_get(). Replace them with blkdev_get_by_dev() and blkdev_get_by_path(). blkdev_get_by_dev() is identical to open_by_devnum(). blkdev_get_by_path() is slightly different in that it doesn't automatically add %FMODE_EXCL to @mode. All users are converted. Most conversions are mechanical and don't introduce any behavior difference. There are several exceptions. * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no reason to OR it explicitly on blkdev_put(). * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in sb->s_mode. * With the above changes, sb->s_mode now always should contain FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect errors. The new blkdev_get_*() functions are with proper docbook comments. While at it, add function description to blkdev_get() too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Neil Brown <neilb@suse.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Joern Engel <joern@lazybastard.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: reiserfs-devel@vger.kernel.org Cc: xfs-masters@oss.sgi.com Cc: Alexander Viro <viro@zeniv.linux.org.uk>
\| *	block: make blkdev_get/put() handle exclusive access	Tejun Heo	2010-11-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Over time, block layer has accumulated a set of APIs dealing with bdev open, close, claim and release. * blkdev_get/put() are the primary open and close functions. * bd_claim/release() deal with exclusive open. * open/close_bdev_exclusive() are combination of open and claim and the other way around, respectively. * bd_link/unlink_disk_holder() to create and remove holder/slave symlinks. * open_by_devnum() wraps bdget() + blkdev_get(). The interface is a bit confusing and the decoupling of open and claim makes it impossible to properly guarantee exclusive access as in-kernel open + claim sequence can disturb the existing exclusive open even before the block layer knows the current open if for another exclusive access. Reorganize the interface such that, * blkdev_get() is extended to include exclusive access management. @holder argument is added and, if is @FMODE_EXCL specified, it will gain exclusive access atomically w.r.t. other exclusive accesses. * blkdev_put() is similarly extended. It now takes @mode argument and if @FMODE_EXCL is set, it releases an exclusive access. Also, when the last exclusive claim is released, the holder/slave symlinks are removed automatically. * bd_claim/release() and close_bdev_exclusive() are no longer necessary and either made static or removed. * bd_link_disk_holder() remains the same but bd_unlink_disk_holder() is no longer necessary and removed. * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev() and blkdev_get(). It also has an unexpected extra bdev_read_only() test which probably should be moved into blkdev_get(). * open_by_devnum() is modified to take @holder argument and pass it to blkdev_get(). Most of bdev open/close operations are unified into blkdev_get/put() and most exclusive accesses are tested atomically at the open time (as it should). This cleans up code and removes some, both valid and invalid, but unnecessary all the same, corner cases. open_bdev_exclusive() and open_by_devnum() can use further cleanup - rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop special features. Well, let's leave them for another day. Most conversions are straight-forward. drbd conversion is a bit more involved as there was some reordering, but the logic should stay the same. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Philipp Reisner <philipp.reisner@linbit.com> Cc: Peter Osterlund <petero2@telia.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: dm-devel@redhat.com Cc: drbd-dev@lists.linbit.com Cc: Leo Chen <leochen@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Joern Engel <joern@logfs.org> Cc: reiserfs-devel@vger.kernel.org Cc: Alexander Viro <viro@zeniv.linux.org.uk>
* \|	headers: kobject.h redux	Alexey Dobriyan	2011-01-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove kobject.h from files which don't need it, notably, sched.h and fs.h. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	nilfs2: get rid of nilfs_mount_options structure	Ryusuke Konishi	2011-01-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only mount_opt member is used in the nilfs_mount_options structure, and we can simplify it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* \|	fs/nilfs2/super.c: Use printf extension %pV	Joe Perches	2011-01-10	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* \|	fs: icache RCU free inodes	Nick Piggin	2011-01-07	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* \|	fs: dcache scale dentry refcount	Nick Piggin	2011-01-07	1	-1/+1
\|/ \| \| \| \| \| \| \|	Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
*	convert nilfs	Al Viro	2010-10-29	1	-9/+7
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-10-23	1	-282/+327
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (36 commits) nilfs2: eliminate sparse warning - "context imbalance" nilfs2: eliminate sparse warnings - "symbol not declared" nilfs2: get rid of bdi from nilfs object nilfs2: change license of exported header file nilfs2: add bdev freeze/thaw support nilfs2: accept 64-bit checkpoint numbers in cp mount option nilfs2: remove own inode allocator and destructor for metadata files nilfs2: get rid of back pointer to writable sb instance nilfs2: get rid of mi_nilfs back pointer to nilfs object nilfs2: see state of root dentry for mount check of snapshots nilfs2: use iget for all metadata files nilfs2: get rid of GCDAT inode nilfs2: add routines to redirect access to buffers of DAT file nilfs2: add routines to roll back state of DAT file nilfs2: add routines to save and restore bmap state nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes nilfs2: allow nilfs_clear_inode to clear metadata file inodes nilfs2: get rid of snapshot mount flag nilfs2: simplify life cycle management of nilfs object nilfs2: do not allocate multiple super block instances for a device ...
\| *	nilfs2: eliminate sparse warnings - "symbol not declared"	Jiro SEKIBA	2010-10-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change nilfs_dat_commit_free and nilfs_inode_cachep static to fix following warnings fs/nilfs2/super.c:72:19: warning: symbol 'nilfs_inode_cachep' was not declared. Should it be static? fs/nilfs2/dat.c:106:6: warning: symbol 'nilfs_dat_commit_free' was not declared. Should it be static? Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: get rid of bdi from nilfs object	Ryusuke Konishi	2010-10-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nilfs now can use sb->s_bdi to get backing_dev_info, so we use it instead of ns_bdi on the nilfs object and remove ns_bdi. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: add bdev freeze/thaw support	Ryusuke Konishi	2010-10-23	1	-4/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nilfs hasn't supported the freeze/thaw feature because it didn't work due to the peculiar design that multiple super block instances could be allocated for a device. This limitation was removed by the patch "nilfs2: do not allocate multiple super block instances for a device". So now this adds the freeze/thaw support to nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: accept 64-bit checkpoint numbers in cp mount option	Ryusuke Konishi	2010-10-23	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation doesn't mount snapshots with checkpoint numbers larger than INT_MAX since it uses match_int() for parsing "cp=" mount option. This uses simple_strtoull() for the conversion to resolve the issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: remove own inode allocator and destructor for metadata files	Ryusuke Konishi	2010-10-23	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This finally removes own inode allocator and destructor functions for metadata files. Several routines, nilfs_mdt_new(), nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and nilfs_alloc_inode_common() will be gone. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: see state of root dentry for mount check of snapshots	Ryusuke Konishi	2010-10-23	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After applied the patch that unified sb instances, root dentry of snapshots can be left in dcache even after their trees are unmounted. The orphan root dentry/inode keeps a root object, and this causes false positive of nilfs_checkpoint_is_mounted function. This resolves the issue by having nilfs_checkpoint_is_mounted test whether the root dentry is busy or not. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: use iget for all metadata files	Ryusuke Konishi	2010-10-23	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes use of iget5_locked to allocate or get inode for metadata files to stop using own inode allocator. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: simplify life cycle management of nilfs object	Ryusuke Konishi	2010-10-23	1	-45/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This stops pre-allocating nilfs object in nilfs_get_sb routine, and stops managing its life cycle by reference counting. nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex, nilfs_objects list, and the reference counter will be removed through the simplification. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: do not allocate multiple super block instances for a device	Ryusuke Konishi	2010-10-23	1	-116/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This stops allocating multiple super block instances for a device. All snapshots and a current mode mount (i.e. latest tree) will be controlled with nilfs_root objects that are kept within an sb instance. nilfs_get_sb() is rewritten so that it always has a root object for the latest tree and snapshots make additional root objects. The root dentry of the latest tree is binded to sb->s_root even if it isn't attached on a directory. Root dentries of snapshots or the latest tree are binded to mnt->mnt_root on which they are mounted. With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list, and nilfs->ns_current back pointer, are deleted. In addition, init_nilfs() and load_nilfs() are simplified since they will be called once for a device, not repeatedly called for mount points. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: split out nilfs_attach_snapshot	Ryusuke Konishi	2010-10-23	1	-28/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This splits the code to attach snapshots into a separate routine for convenience sake. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: split out nilfs_get_root_dentry	Ryusuke Konishi	2010-10-23	1	-19/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This splits the code to allocate root dentry into a separate routine for convenience in successive changes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: move inode count and block count into root object	Ryusuke Konishi	2010-10-23	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root object. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: use root object to get ifile	Ryusuke Konishi	2010-10-23	1	-29/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This rewrites functions using ifile so that they get ifile from nilfs_root object, and will remove sbi->s_ifile. Some functions that don't know the root object are extended to receive it from caller. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: make snapshots in checkpoint tree exportable	Ryusuke Konishi	2010-10-23	1	-51/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous export operations cannot handle multiple versions of a filesystem if they belong to the same sb instance. This adds a new type of file handle and extends export operations so that they can get the inode specified by a checkpoint number as well as an inode number and a generation number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: set pointer to root object in inodes	Ryusuke Konishi	2010-10-23	1	-7/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This puts a pointer to nilfs_root object in the private part of on-memory inode, and makes nilfs_iget function pick up the inode with the same root object. Non-root inodes inherit its nilfs_root object from parent inode. That of the root inode is allocated through nilfs_attach_checkpoint() function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
\| *	nilfs2: use iget5_locked to get inode	Ryusuke Konishi	2010-10-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses iget5_locked instead of iget_locked so that gc cache can look up inodes with an inode number and an optional checkpoint number. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>