op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	fs/nls: add Apple NLS	Vladimir Serbinenko	2012-05-31	13	-1/+6451
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HFS has support for NLS. However the relevant NLS tables are missing. Here they are automatically transformed from the tables at unicode.org. Codepages requiring special handling like CJK, RTL or Brahmic ones are not included in this patch. [akpm@linux-foundation.org: add unicode.org copyright and permission notices] Signed-off-by: Vladimir Serbinenko <phcoder@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc/smaps: show amount of nonlinear ptes in vma	Konstantin Khlebnikov	2012-05-31	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, nonlinear mappings can not be distinguished from ordinary mappings. This patch adds into /proc/pid/smaps line "Nonlinear: <size> kB", where size is amount of nonlinear ptes in vma, this line appears only if VM_NONLINEAR is set. This information may be useful not only for checkpoint/restore project. Requested by Pavel Emelyanov. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc/smaps: carefully handle migration entries	Konstantin Khlebnikov	2012-05-31	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently smaps reports migration entries as "swap", as result "swap" can appears in shared mapping. This patch converts migration entries into pages and handles them as usual. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: report file/anon bit in /proc/pid/pagemap	Konstantin Khlebnikov	2012-05-31	1	-18/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an implementation of Andrew's proposal to extend the pagemap file bits to report what is missing about tasks' working set. The problem with the working set detection is multilateral. In the criu (checkpoint/restore) project we dump the tasks' memory into image files and to do it properly we need to detect which pages inside mappings are really in use. The mincore syscall I though could help with this did not. First, it doesn't report swapped pages, thus we cannot find out which parts of anonymous mappings to dump. Next, it does report pages from page cache as present even if they are not mapped, and it doesn't make that has not been cow-ed. Note, that issue with swap pages is critical -- we must dump swap pages to image file. But the issues with file pages are optimization -- we can take all file pages to image, this would be correct, but if we know that a page is not mapped or not cow-ed, we can remove them from dump file. The dump would still be self-consistent, though significantly smaller in size (up to 10 times smaller on real apps). Andrew noticed, that the proc pagemap file solved 2 of 3 above issues -- it reports whether a page is present or swapped and it doesn't report not mapped page cache pages. But, it doesn't distinguish cow-ed file pages from not cow-ed. I would like to make the last unused bit in this file to report whether the page mapped into respective pte is PageAnon or not. [comment stolen from Pavel Emelyanov's v1 patch] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	procfs: use more apprioriate types when dumping /proc/N/stat	Jan Engelhardt	2012-05-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	- use int fpr priority and nice, since task_nice()/task_prio() return that - field 24: get_mm_rss() returns unsigned long Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: pass "fd" by value in /proc/*/{fd,fdinfo} code	Alexey Dobriyan	2012-05-31	1	-4/+4
\| \| \| \| \| \| \| \|	Pass "fd" directly, not via pointer -- one less memory read. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: don't do dummy rcu_read_lock/rcu_read_unlock on error path	Alexey Dobriyan	2012-05-31	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	rcu_read_lock()/rcu_read_unlock() is nop for TINY_RCU, but is not a nop for, say, PREEMPT_RCU. proc_fill_cache() is called without RCU lock, there is no need to lock/unlock on error path, simply jump out of the loop. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: use mm_access() instead of ptrace_may_access()	Cong Wang	2012-05-31	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \|	mm_access() handles this much better, and avoids some race conditions. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: remove mm_for_maps()	Cong Wang	2012-05-31	4	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \|	mm_for_maps() is a simple wrapper for mm_access(), and the name is misleading, so just remove it and use mm_access() directly. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: clean up /proc/<pid>/environ handling	Cong Wang	2012-05-31	1	-21/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to e268337dfe26 ("proc: clean up and fix /proc/<pid>/mem handling"), move the check of permission to open(), this will simplify read() code. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: use fat_msg_ratelimit() in fat__get_entry()	Namjae Jeon	2012-05-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an application tries to lookup (opendir/readdir/stat) 5000 files on a fatfs USB device and the device is unplugged, many message occur, shown below. This makes the application slow. So use the new fat_msg_ratelimit() decrease the messaging rate. #> ./file_lookup_testcase ./files_directory/ usb 2-1.4: USB disconnect, device number 4 FAT-fs (sda1): FAT read failed (blocknr 2631) FAT-fs (sda1): Directory bread(block 396816) failed FAT-fs (sda1): Directory bread(block 396817) failed FAT-fs (sda1): Directory bread(block 396818) failed FAT-fs (sda1): Directory bread(block 396819) failed FAT-fs (sda1): Directory bread(block 396820) failed FAT-fs (sda1): Directory bread(block 396821) failed FAT-fs (sda1): Directory bread(block 396822) failed FAT-fs (sda1): Directory bread(block 396823) failed FAT-fs (sda1): Directory bread(block 406824) failed FAT-fs (sda1): Directory bread(block 406825) failed FAT-fs (sda1): Directory bread(block 406826) failed FAT-fs (sda1): Directory bread(block 406827) failed FAT-fs (sda1): Directory bread(block 406828) failed FAT-fs (sda1): Directory bread(block 406829) failed FAT-fs (sda1): Directory bread(block 406830) failed FAT-fs (sda1): Directory bread(block 406831) failed FAT-fs (sda1): Directory bread(block 417696) failed FAT-fs (sda1): Directory bread(block 417697) failed FAT-fs (sda1): Directory bread(block 417698) failed FAT-fs (sda1): Directory bread(block 417699) failed FAT-fs (sda1): Directory bread(block 417700) failed FAT-fs (sda1): Directory bread(block 417701) failed FAT-fs (sda1): Directory bread(block 417702) failed FAT-fs (sda1): Directory bread(block 417703) failed FAT-fs (sda1): FAT read failed (blocknr 2631) FAT-fs (sda1): Directory bread(block 396816) failed FAT-fs (sda1): Directory bread(block 396817) failed FAT-fs (sda1): Directory bread(block 396818) failed FAT-fs (sda1): Directory bread(block 396819) failed FAT-fs (sda1): Directory bread(block 396820) failed FAT-fs (sda1): Directory bread(block 396821) failed Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: add fat_msg_ratelimit()	Namjae Jeon	2012-05-31	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Add a fat_msg_ratelimit() to limit the message generation rate. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: switch to fsinfo_inode	Artem Bityutskiy	2012-05-31	2	-31/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently FAT file-system maps the VFS "superblock" abstraction to the FSINFO block. The FSINFO block contains non-essential data about the amount of free clusters and the next free cluster. FAT file-system can always find out this information by scanning the FAT table, but having it in the FSINFO block may speed things up sometimes. So FAT file-system relies on the VFS superblock write-out services to make sure the FSINFO block is written out to the media from time to time. The whole "superblock write-out" VFS infrastructure is served by the 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and writes out all dirty superblock using the '->write_super()' call-back. But the problem with this thread is that it wastes power by waking up the system every 5 seconds no matter what. So we want to kill it completely and thus, we need to make file-systems to stop using the '->write_super' VFS service, and then remove it together with the kernel thread. This patch switches the FAT FSINFO block management from '->write_super()'/'->s_dirt' to 'fsinfo_inode'/'->write_inode'. Now, instead of setting the 's_dirt' flag, we just mark the special 'fsinfo_inode' inode as dirty and let VFS invoke the '->write_inode' call-back when needed, where we write-out the FSINFO block. This patch also makes sure we do not mark the 'fsinfo_inode' inode as dirty if we are not FAT32 (FAT16 and FAT12 do not have the FSINFO block) or if we are in R/O mode. As a bonus, we can also remove the '->sync_fs()' and '->write_super()' FAT call-back function because they become unneeded. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: mark superblock as dirty less often	Artem Bityutskiy	2012-05-31	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Preparation for further changes. It touches few functions in fatent.c and prevents them from marking the superblock as dirty unnecessarily often. Namely, instead of marking it as dirty in the internal tight loops - do it only once at the end of the functions. And instead of marking it as dirty while holding the FAT table lock, do it outside the lock. The reason for this patch is that marking the superblock as dirty will soon become a little bit heavier operation, so it is cleaner to do this only when it is necessary. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: introduce mark_fsinfo_dirty helper	Artem Bityutskiy	2012-05-31	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \|	A preparation patch which introduces a 'mark_fsinfo_dirty()' helper function which just sets the 's_dirt' flag to 1 so far. I'll add more code to this helper later, so I do not mark it as 'inline'. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fat: introduce special inode for managing the FSINFO block	Artem Bityutskiy	2012-05-31	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is patchset makes fatfs stop using the VFS '->write_super()' method for writing out the FSINFO block. The final goal is to get rid of the 'sync_supers()' kernel thread. This kernel thread wakes up every 5 seconds (by default) and calls '->write_super()' for all mounted file-systems. And the bad thing is that this is done even if all the superblocks are clean. Moreover, some file-systems do not even need this end they do not register the '->write_super()' method at all (e.g., btrfs). So 'sync_supers()' most often just generates useless wake-ups and wastes power. I am trying to make all file-systems independent of '->write_super()' and plan to remove 'sync_supers()' and '->write_super' completely once there are no more users. The '->write_supers()' method is mostly used by baroque file-systems like hfs, udf, etc. Modern file-systems like btrfs and xfs do not use it. This justifies removing this stuff from VFS completely and make every FS self-manage own superblock. Tested with xfstests. This patch: Preparation for further changes. It introduces a special inode ('fsinfo_inode') in FAT file-system which we'll later use for managing the FSINFO block. Note, this there is already one special inode ('fat_inode') which is used for managing the FAT tables. Introduce new 'MSDOS_FSINFO_INO' constant for this special inode. It is safe to do because FAT file-system does not store inode numbers on the media but generates them run-time. I've also cleaned up the comment to existing 'MSDOS_ROOT_INO' constant, while on it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	HPFS: remove PRINTK() macro	Dan Carpenter	2012-05-31	2	-8/+0
\| \| \| \| \| \| \| \| \| \|	The PRINTK() macro isn't really used. Let's just remove it because it is ugly and out of date. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	nilfs2: flush disk caches in syncing	Ryusuke Konishi	2012-05-31	2	-11/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two cases that the cache flush is needed to avoid data loss against unexpected hang or power failure. One is sync file function (i.e. nilfs_sync_file) and another is checkpointing ioctl. This issues a cache flush request to device for such cases if barrier mount option is enabled, and makes sure data really is on persistent storage on their completion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	pipe: return -ENOIOCTLCMD instead of -EINVAL on unknown ioctl command	Will Deacon	2012-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As described in commit 07d106d0a33d ("vfs: fix up ENOIOCTLCMD error handling"), drivers should return -ENOIOCTLCMD if they receive an ioctl command which they don't understand. Doing so will result in -ENOTTY being returned to userspace, which matches the behaviour of the compat layer if it fails to translate an ioctl command. This patch fixes the pipe ioctl to return -ENOIOCTLCMD instead of -EINVAL when passed an unknown ioctl command. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Alan Cox <alan@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	introduce SIZE_MAX	Xi Wang	2012-05-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ULONG_MAX is often used to check for integer overflow when calculating allocation size. While ULONG_MAX happens to work on most systems, there is no guarantee that `size_t' must be the same size as `long'. This patch introduces SIZE_MAX, the maximum value of `size_t', to improve portability and readability for allocation size validation. Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Alex Elder <elder@dreamhost.com> Cc: David Airlie <airlied@linux.ie> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm, oom: normalize oom scores to oom_score_adj scale only for userspace	David Rientjes	2012-05-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The oom_score_adj scale ranges from -1000 to 1000 and represents the proportion of memory available to the process at allocation time. This means an oom_score_adj value of 300, for example, will bias a process as though it was using an extra 30.0% of available memory and a value of -350 will discount 35.0% of available memory from its usage. The oom killer badness heuristic also uses this scale to report the oom score for each eligible process in determining the "best" process to kill. Thus, it can only differentiate each process's memory usage by 0.1% of system RAM. On large systems, this can end up being a large amount of memory: 256MB on 256GB systems, for example. This can be fixed by having the badness heuristic to use the actual memory usage in scoring threads and then normalizing it to the oom_score_adj scale for userspace. This results in better comparison between eligible threads for kill and no change from the userspace perspective. Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/fs: remove truncate_range	Hugh Dickins	2012-05-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove vmtruncate_range(), and remove the truncate_range method from struct inode_operations: only tmpfs ever supported it, and tmpfs has now converted over to using the fallocate method of file_operations. Update Documentation accordingly, adding (setlease and) fallocate lines. And while we're in mm.h, remove duplicate declarations of shmem_lock() and shmem_file_setup(): everyone is now using the ones in shmem_fs.h. Based-on-patch-by: Cong Wang <amwang@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: fix NULL ptr deref when walking hugepages	Sasha Levin	2012-05-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A missing validation of the value returned by find_vma() could cause a NULL ptr dereference when walking the pagetable. This is triggerable from usermode by a simple user by trying to read a page info out of /proc/pid/pagemap which doesn't exist. Introduced by commit 025c5b2451e4 ("thp: optimize away unnecessary page table locking"). Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds	2012-05-29	39	-2786/+3743
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull NFS client updates from Trond Myklebust: "New features include: - Rewrite the O_DIRECT code so that it can share the same coalescing and pNFS functionality as the page cache code. - Allow the server to provide hints as to when we should use pNFS, and when it is more efficient to read and write through the metadata server. - NFS cache consistency updates: * Use the ctime to emulate a change attribute for NFSv2/v3 so that all NFS versions can share the same cache management code. * New cache management code will only look at the change attribute and size attribute when deciding whether or not our cached data is still valid or not. * Don't request NFSv4 post-op attributes on writes in cases such as O_DIRECT, where we don't care about data cache consistency, or when we have a write delegation, and know that our cache is still consistent. * Don't request NFSv4 post-op attributes on operations such as COMMIT, where there are no expected metadata updates. * Don't request NFSv4 directory post-op attributes in cases where the operations themselves already return change attribute updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and RENAME. - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS if we detect no attempts to lookup filenames. - Improve the code sharing between NFSv2/v3 and v4 mounts - NFSv4.1 state management efficiency improvements - More patches in preparation for NFSv4/v4.1 migration functionality." Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache qstr name initialization changes (that made the length/hash a 64-bit union) * tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits) NFSv4: Add debugging printks to state manager NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager NFSv4.1: Handle errors in nfs4_bind_conn_to_session NFSv4.1: nfs4_bind_conn_to_session should drain the session NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid NFSv4.1: Add DESTROY_CLIENTID NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session NFSv4.1: Ensure we use the correct credentials for session create/destroy NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM NFSv4: Clean up the error handling for nfs4_reclaim_lease NFSv4.1: Exchange ID must use GFP_NOFS allocation mode nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN* nfs4.1: add BIND_CONN_TO_SESSION operation NFSv4.1 test the mdsthreshold hint parameters ...
\| *	NFSv4: Add debugging printks to state manager	Trond Myklebust	2012-05-28	1	-0/+33
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO	Trond Myklebust	2012-05-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a file OPEN is denied due to a share lock, the resulting NFS4ERR_SHARE_DENIED is currently mapped to the default EIO. This patch adds a more appropriate mapping, and brings Linux into line with what Solaris 10 does. See https://bugzilla.kernel.org/show_bug.cgi?id=43286 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
\| *	NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE	Trond Myklebust	2012-05-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're already invalidating the data cache, and setting the new change attribute. Since directories don't care about the i_size field, there is no need to be forcing any extra revalidation of the page cache. We do keep the NFS_INO_INVALID_ATTR flag, in order to force an attribute cache revalidation on stat() calls since we do not update the mtime and ctime fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error	Trond Myklebust	2012-05-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The results from a call to nfs4_proc_create_session() should always be fed into nfs4_handle_reclaim_lease_error, so that we can handle errors such as NFS4ERR_SEQ_MISORDERED correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION	Trond Myklebust	2012-05-27	4	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let nfs4_schedule_session_recovery() handle the details of choosing between resetting the session, and other session related recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager	Trond Myklebust	2012-05-27	1	-1/+3
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Handle errors in nfs4_bind_conn_to_session	Trond Myklebust	2012-05-27	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that we handle NFS4ERR_DELAY errors separately, and then let nfs4_recovery_handle_error() handle all other cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: nfs4_bind_conn_to_session should drain the session	Trond Myklebust	2012-05-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to avoid races with other RPC calls that end up setting the NFS4CLNT_BIND_CONN_TO_SESSION flag. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid	Trond Myklebust	2012-05-26	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory supposed to already know the correct value of the seqid, in which case RFC5661 states that it should ignore the value returned. Also ensure that if the sanity check in nfs4_check_cl_exchange_flags fails, then we must not change the nfs_client fields. Finally, clean up the code: we don't need to retest the value of 'status' unless it can change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Add DESTROY_CLIENTID	Trond Myklebust	2012-05-26	4	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \|	Ensure that we destroy our lease on last unmount Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session	Trond Myklebust	2012-05-25	3	-3/+11
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com>
\| *	NFSv4.1: Ensure we use the correct credentials for session create/destroy	Trond Myklebust	2012-05-25	3	-15/+26
\| \| \| \| \| \| \| \|	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations	Trond Myklebust	2012-05-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For backward compatibility with nfs-utils. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com>
\| *	NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease	Trond Myklebust	2012-05-25	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier" is tickling a Linux nfs server bug, and causing a regression: the server can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED to our CREATE_SESSION request even when we are sending the correct sequence ID. Fix this by purging the lease and then retrying. Reported-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM	Trond Myklebust	2012-05-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Otherwise we can end up not sending a new exchange-id/setclientid Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: Clean up the error handling for nfs4_reclaim_lease	Trond Myklebust	2012-05-25	1	-49/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Try to consolidate the error handling for nfs4_reclaim_lease into a single function instead of doing a bit here, and a bit there... Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1: Exchange ID must use GFP_NOFS allocation mode	Trond Myklebust	2012-05-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Exchange ID can be called in a lease reclaim situation, so it will deadlock if it then tries to write out dirty NFS pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*	Weston Andros Adamson	2012-05-24	2	-4/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a BIND_CONN_TO_SESSION instead of destroying the session and creating a new one. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	nfs4.1: add BIND_CONN_TO_SESSION operation	Weston Andros Adamson	2012-05-24	3	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the BIND_CONN_TO_SESSION operation which is needed for upcoming SP4_MACH_CRED work and useful for recovering from broken connections without destroying the session. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1 test the mdsthreshold hint parameters	Andy Adamson	2012-05-24	1	-0/+79
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1 add nfs_inode book keeping for mdsthreshold	Andy Adamson	2012-05-24	5	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1 cache mdsthreshold values on OPEN	Andy Adamson	2012-05-24	4	-5/+68
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4.1 mdsthreshold attribute xdr	Andy Adamson	2012-05-24	1	-2/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only support one layout type per file system, so one threshold_item4 per mdsthreshold4. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFS: Add memory barriers to the nfs_client->cl_cons_state initialisation	Trond Myklebust	2012-05-23	3	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure that a process that uses the nfs_client->cl_cons_state test for whether the initialisation process is finished does not read stale data. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
\| *	NFSv4: Fix a race in the net namespace mount notification	Trond Myklebust	2012-05-23	3	-3/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the struct nfs_client gets added to the global nfs_client_list before it is initialised, it is possible that rpc_pipefs_event can end up trying to create idmapper entries on such a thing. The solution is to have the mount notification wait for the initialisation of each nfs_client to complete, and then to skip any entries for which the it failed. Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
\| *	NFSv4.1: Fix session initialisation races	Trond Myklebust	2012-05-23	4	-67/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Session initialisation is not complete until the lease manager has run. We need to ensure that both nfs4_init_session and nfs4_init_ds_session do so, and that they check for any resulting errors in clp->cl_cons_state. Only after this is done, can nfs4_ds_connect check the contents of clp->cl_exchange_flags. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com>