op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	[NET]: make netlink user -> kernel interface synchronious	Denis V. Lunev	2007-10-10	1	-13/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Support multiple network namespaces with netlink	Eric W. Biederman	2007-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	eCryptfs: fix possible fault in ecryptfs_sync_page	Ryusuke Konishi	2007-08-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will avoid a possible fault in ecryptfs_sync_page(). In the function, eCryptfs calls sync_page() method of a lower filesystem without checking its existence. However, there are many filesystems that don't have this method including network filesystems such as NFS, AFS, and so forth. They may fail when an eCryptfs page is waiting for lock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	revert "eCryptfs: fix lookup error for special files"	Andrew Morton	2007-08-31	1	-4/+0
\| \| \| \| \| \| \| \| \|	This patch got appied twice. Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix lookup error for special files	Ryusuke Konishi	2007-08-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ecryptfs_lookup() is called against special files, eCryptfs generates the following errors because it tries to treat them like regular eCryptfs files. Error opening lower file for lower_dentry [0xffff810233a6f150], lower_mnt [0xffff810235bb4c80], and flags [0x8000] Error opening lower_file to read header region Error attempting to read the [user.ecryptfs] xattr from the lower file; return value = [-95] Valid metadata not found in header region or xattr region; treating file as unencrypted For instance, the problem can be reproduced by the steps below. # mkdir /root/crypt /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # mknod /mnt/crypt/c0 c 0 0 # umount /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # ls -l /mnt/crypt This patch fixes it by adding a check similar to directories and symlinks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix error handling in ecryptfs_init	Ryusuke Konishi	2007-08-11	1	-5/+13
\| \| \| \| \| \| \| \| \| \| \| \|	ecryptfs_init() exits without doing any cleanup jobs if ecryptfs_init_messaging() fails. In that case, eCryptfs leaves sysfs entries, leaks memory, and causes an invalid page fault. This patch fixes the problem. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix lookup error for special files	Ryusuke Konishi	2007-08-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ecryptfs_lookup() is called against special files, eCryptfs generates the following errors because it tries to treat them like regular eCryptfs files. Error opening lower file for lower_dentry [0xffff810233a6f150], lower_mnt [0xffff810235bb4c80], and flags [0x8000] Error opening lower_file to read header region Error attempting to read the [user.ecryptfs] xattr from the lower file; return value = [-95] Valid metadata not found in header region or xattr region; treating file as unencrypted For instance, the problem can be reproduced by the steps below. # mkdir /root/crypt /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # mknod /mnt/crypt/c0 c 0 0 # umount /mnt/crypt # mount -t ecryptfs /root/crypt /mnt/crypt # ls -l /mnt/crypt This patch fixes it by adding a check similar to directories and symlinks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fix some conversion overflows	Nick Piggin	2007-07-20	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix page index to offset conversion overflows in buffer layer, ecryptfs, and ocfs2. It would be nice to convert the whole tree to page_offset, but for now just fix the bugs. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: Remove slab destructors from kmem_cache_create().	Paul Mundt	2007-07-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
*	eCryptfs: ecryptfs_setattr() bugfix	Michael Halcrow	2007-07-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is another bug recently introduced into the ecryptfs_setattr() function in 2.6.22. eCryptfs will attempt to treat special files like regular eCryptfs files on chmod, chown, and so forth. This leads to a NULL pointer dereference. This patch validates that the file is a regular file before proceeding with operations related to the inode's crypt_stat. Thanks to Ryusuke Konishi for finding this bug and suggesting the fix. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Couple fixes to fs/ecryptfs/inode.c	Mika Kukkonen	2007-07-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following was uncovered by compiling the kernel with '-W' flag: CC [M] fs/ecryptfs/inode.o fs/ecryptfs/inode.c: In function ‘ecryptfs_lookup’: fs/ecryptfs/inode.c:304: warning: comparison of unsigned expression < 0 is always false fs/ecryptfs/inode.c: In function ‘ecryptfs_symlink’: fs/ecryptfs/inode.c:486: warning: comparison of unsigned expression < 0 is always false Function ecryptfs_encode_filename() can return -ENOMEM, so change the variables to plain int, as in the first case the only real use actually expects int, and in latter case there is no use beoynd the error check. Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sysfs: kill unnecessary attribute->owner	Tejun Heo	2007-07-11	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sysfs is now completely out of driver/module lifetime game. After deletion, a sysfs node doesn't access anything outside sysfs proper, so there's no reason to hold onto the attribute owners. Note that often the wrong modules were accounted for as owners leading to accessing removed modules. This patch kills now unnecessary attribute->owner. Note that with this change, userland holding a sysfs node does not prevent the backing module from being unloaded. For more info regarding lifetime rule cleanup, please read the following message. http://article.gmane.org/gmane.linux.kernel/510293 (tweaked by Greg to not delete the field just yet, to make it easier to merge things properly.) Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	sendfile: remove .sendfile from filesystems that use generic_file_sendfile()	Jens Axboe	2007-07-10	1	-7/+8
\| \| \| \| \| \| \|	They can use generic_file_splice_read() instead. Since sys_sendfile() now prefers that, there should be no change in behaviour. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
*	zero out last page for llseek/write	Michael Halcrow	2007-06-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	When one llseek's past the end of the file and then writes, every page past the previous end of the file should be cleared. Trevor found that the code, as is, does not assure that the very last page is always cleared. This patch takes care of that. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: initialize crypt_stat in setattr	Michael Halcrow	2007-06-28	1	-1/+46
\| \| \| \| \| \| \| \| \| \| \|	Recent changes in eCryptfs have made it possible to get to ecryptfs_setattr() with an uninitialized crypt_stat struct. This results in a wide and colorful variety of unpleasantries. This patch properly initializes the crypt_stat structure in ecryptfs_setattr() when it is necessary to do so. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: fix write zeros behavior	Michael Halcrow	2007-06-28	3	-24/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the processes involved in wiping regions of the data during truncate and write events, fixing a kernel hang in 2.6.22-rc4 while assuring that zero values are written out to the appropriate locations during events in which the i_size will change. The range passed to ecryptfs_truncate() from ecryptfs_prepare_write() includes the page that is the object of ecryptfs_prepare_write(). This leads to a kernel hang as read_cache_page() is executed on the same page in the ecryptfs_truncate() execution path. This patch remedies this by limiting the range passed to ecryptfs_truncate() so as to exclude the page that is the object of ecryptfs_prepare_write(); it also adds code to ecryptfs_prepare_write() to zero out the region of its own page when writing past the i_size position. This patch also modifies ecryptfs_truncate() so that when a file is truncated to a smaller size, eCryptfs will zero out the contents of the new last page from the new size through to the end of the last page. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	eCryptfs: delay writing 0's after llseek until write	Michael Halcrow	2007-05-23	2	-61/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Delay writing 0's out in eCryptfs after a seek past the end of the file until data is actually written. http://www.opengroup.org/onlinepubs/009695399/functions/lseek.html ``The lseek() function shall not, by itself, extend the size of a file.'' Without this fix, applications that lseek() past the end of the file without writing will experience unexpected behavior. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Detach sched.h from mm.h	Alexey Dobriyan	2007-05-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ecryptfs: use zero_user_page	Nate Diller	2007-05-17	1	-11/+3
\| \| \| \| \| \| \| \| \|	Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Remove SLAB_CTOR_CONSTRUCTOR	Christoph Lameter	2007-05-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	header cleaning: don't include smp_lock.h when not used	Randy Dunlap	2007-05-08	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	slab allocators: Remove SLAB_DEBUG_INITIAL flag	Christoph Lameter	2007-05-07	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: make read_cache_page synchronous	Nick Piggin	2007-05-07	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	remove "struct subsystem" as it is no longer needed	Greg Kroah-Hartman	2007-05-02	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it	Patrick McHardy	2007-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETLINK]: Introduce nlmsg_hdr() helper	Arnaldo Carvalho de Melo	2007-04-25	1	-2/+2
\| \| \| \| \| \| \| \| \|	For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] eCryptfs: fix possible NULL ptr deref in ecryptfs_d_release()	Michael Halcrow	2007-03-16	1	-10/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	ecryptfs_d_release() first dereferences a pointer (via ecryptfs_dentry_to_lower()) and then afterwards checks to see if the pointer it just dereferenced is NULL (via ecryptfs_dentry_to_private()). This patch moves all of the work done on the dereferenced pointer inside a block governed by the condition that the pointer is non-NULL. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] ecryptfs: nested locking annotation	Peter Zijlstra	2007-03-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	ecryptfs uses a lock_parent() function, which I hope really locks the parents and is not abused Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] ecryptfs: handle AOP_TRUNCATED_PAGE better	Dmitriy Monakhov	2007-03-05	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from prepare_write or commit_write. It is beter to retry attempt where it is possible. - Rearange ecryptfs_get_lower_page() error handling logic, make it more clean. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] ecryptfs: lower root result must be adirectory	Dmitriy Monakhov	2007-03-05	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \|	- Currently after path_lookup succeed we dot't have any guarantie what it is DIR. This must be explicitly demanded. - path_lookup can't return negative dentry, So inode check is useless. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] ecryptfs: check xattr operation support fix	Dmitriy Monakhov	2007-03-05	2	-4/+5
\| \| \| \| \| \| \| \| \| \|	- ecryptfs_write_inode_size_to_metadata() error code was ignored. - i_op->setxattr() must be supported by lower fs because used below. Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Acked-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: no path_release() after path_lookup() error	Michael Halcrow	2007-03-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dmitriy Monakhov wrote: > if path_lookup() return non zero code we don't have to worry about > 'nd' parameter, but ecryptfs_read_super does path_release(&nd) after > path_lookup has failed, and dentry counter becomes negative Do not do a path_release after a path_lookup error. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: remove unnecessary flush_dcache_page()	Michael Halcrow	2007-03-01	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Remove unnecessary flush_dcache_page() call. Thanks to Dmitriy Monakhov for pointing this out. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: set O_LARGEFILE when opening lower file	Michael Halcrow	2007-03-01	2	-3/+1
\| \| \| \| \| \| \| \| \|	O_LARGEFILE should be set here when opening the lower file. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Dmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: resolve lower page unlocking problem	Michael Halcrow	2007-03-01	1	-5/+23
\| \| \| \| \| \| \| \| \| \| \|	eCryptfs lower file handling code has several issues: - Retval from prepare_write()/commit_write() wasn't checked to equality to AOP_TRUNCATED_PAGE. - In some places page wasn't unmapped and unlocked after error. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] ecryptfs: fix forgotten format specifier	Thomas Hisch	2007-02-16	1	-1/+2
\| \| \| \| \| \| \| \| \|	Add format specifier %d for uid in ecryptfs_printk Signed-off-by: Thomas Hisch <t.hisch@gmail.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: Reduce stack usage in ecryptfs_generate_key_packet_set()	Michael Halcrow	2007-02-16	3	-8/+24
\| \| \| \| \| \| \| \| \| \| \|	eCryptfs is gobbling a lot of stack in ecryptfs_generate_key_packet_set() because it allocates a temporary memory-hungry ecryptfs_key_record struct. This patch introduces a new kmem_cache for that struct and converts ecryptfs_generate_key_packet_set() to use it. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] remove many unneeded #includes of sched.h	Tim Schmielau	2007-02-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] Mark struct super_operations const	Josef 'Jeff' Sipek	2007-02-12	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	This patch is inspired by Arjan's "Patch series to mark struct file_operations and struct inode_operations const". Compile tested with gcc & sparse. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] mark struct inode_operations const 1	Arjan van de Ven	2007-02-12	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: add flush_dcache_page() calls	Michael Halcrow	2007-02-12	1	-0/+6
\| \| \| \| \| \| \| \|	Call flush_dcache_page() after modifying a pagecache by hand. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: open-code flag checking and manipulation	Michael Halcrow	2007-02-12	7	-68/+49
\| \| \| \| \| \| \| \| \|	Open-code flag checking and manipulation. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Trevor Highland <tshighla@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: convert kmap() to kmap_atomic()	Michael Halcrow	2007-02-12	3	-94/+39
\| \| \| \| \| \| \| \| \| \|	Replace kmap() with kmap_atomic(). Reduce the amount of time that mappings are held. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Trevor Highland <tshighla@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: convert f_op->write() to vfs_write()	Michael Halcrow	2007-02-12	2	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sys_write() takes a local copy of f_pos and writes that back into the struct file. It does this so that two concurrent write() callers don't make a mess of f_pos, and of the file contents. ecryptfs should be calling vfs_write(). That way we also get the fsnotify notifications, which ecryptfs presently appears to have subverted. Convert direct calls to f_op->write() into calls to vfs_write(). Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: Encrypted passthrough	Michael Halcrow	2007-02-12	5	-11/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide an option to provide a view of the encrypted files such that the metadata is always in the header of the files, regardless of whether the metadata is actually in the header or in the extended attribute. This mode of operation is useful for applications like incremental backup utilities that do not preserve the extended attributes when directly accessing the lower files. With this option enabled, the files under the eCryptfs mount point will be read-only. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: Generalize metadata read/write	Michael Halcrow	2007-02-12	7	-146/+348
\| \| \| \| \| \| \| \| \| \| \| \|	Generalize the metadata reading and writing mechanisms, with two targets for now: metadata in file header and metadata in the user.ecryptfs xattr of the lower file. [akpm@osdl.org: printk warning fix] [bunk@stusta.de: make some needlessly global code static] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: xattr flags and mount options	Michael Halcrow	2007-02-12	3	-7/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch set introduces the ability to store cryptographic metadata into an lower file extended attribute rather than the lower file header region. This patch set implements two new mount options: ecryptfs_xattr_metadata - When set, newly created files will have their cryptographic metadata stored in the extended attribute region of the file rather than the header. When storing the data in the file header, there is a minimum of 8KB reserved for the header information for each file, making each file at least 12KB in size. This can take up a lot of extra disk space if the user creates a lot of small files. By storing the data in the extended attribute, each file will only occupy at least of 4KB of space. As the eCryptfs metadata set becomes larger with new features such as multi-key associations, most popular filesystems will not be able to store all of the information in the xattr region in some cases due to space constraints. However, the majority of users will only ever associate one key per file, so most users will be okay with storing their data in the xattr region. This option should be used with caution. I want to emphasize that the xattr must be maintained under all circumstances, or the file will be rendered permanently unrecoverable. The last thing I want is for a user to forget to set an xattr flag in a backup utility, only to later discover that their backups are worthless. ecryptfs_encrypted_view - When set, this option causes eCryptfs to present applications a view of encrypted files as if the cryptographic metadata were stored in the file header, whether the metadata is actually stored in the header or in the extended attributes. No matter what eCryptfs winds up doing in the lower filesystem, I want to preserve a baseline format compatibility for the encrypted files. As of right now, the metadata may be in the file header or in an xattr. There is no reason why the metadata could not be put in a separate file in future versions. Without the compatibility mode, backup utilities would have to know to back up the metadata file along with the files. The semantics of eCryptfs have always been that the lower files are self-contained units of encrypted data, and the only additional information required to decrypt any given eCryptfs file is the key. That is what has always been emphasized about eCryptfs lower files, and that is what users expect. Providing the encrypted view option will provide a way to userspace applications wherein they can always get to the same old familiar eCryptfs encrypted files, regardless of what eCryptfs winds up doing with the metadata behind the scenes. This patch: Add extended attribute support to version bit vector, flags to indicate when xattr or encrypted view modes are enabled, and support for the new mount options. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: Public key; packet management	Michael Halcrow	2007-02-12	7	-170/+777
\| \| \| \| \| \| \| \| \| \| \| \| \|	Public key support code. This reads and writes packets in the header that contain public key encrypted file keys. It calls the messaging code in the previous patch to send and receive encryption and decryption request packets from the userspace daemon. [akpm@osdl.org: cleab fix] Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] eCryptfs: Public key transport mechanism	Michael Halcrow	2007-02-12	3	-3/+858
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the transport code for public key functionality in eCryptfs. It manages encryption/decryption request queues with a transport mechanism. Currently, netlink is the only implemented transport. Each inode has a unique File Encryption Key (FEK). Under passphrase, a File Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase combo on mount. This FEKEK encrypts each FEK and writes it into the header of each file using the packet format specified in RFC 2440. This is all symmetric key encryption, so it can all be done via the kernel crypto API. These new patches introduce public key encryption of the FEK. There is no asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes the FEK encryption and decryption out to a userspace daemon. After considering our requirements and determining the complexity of using various transport mechanisms, we settled on netlink for this communication. eCryptfs stores authentication tokens into the kernel keyring. These tokens correlate with individual keys. For passphrase mode of operation, the authentication token contains the symmetric FEKEK. For public key, the authentication token contains a PKI type and an opaque data blob managed by individual PKI modules in userspace. Each user who opens a file under an eCryptfs partition mounted in public key mode must be running a daemon. That daemon has the user's credentials and has access to all of the keys to which the user should have access. The daemon, when started, initializes the pluggable PKI modules available on the system and registers itself with the eCryptfs kernel module. Userspace utilities register public key authentication tokens into the user session keyring. These authentication tokens correlate key signatures with PKI modules and PKI blobs. The PKI blobs contain PKI-specific information necessary for the PKI module to carry out asymmetric key encryption and decryption. When the eCryptfs module parses the header of an existing file and finds a Tag 1 (Public Key) packet (see RFC 2440), it reads in the public key identifier (signature). The asymmetrically encrypted FEK is in the Tag 1 packet; eCryptfs puts together a decrypt request packet containing the signature and the encrypted FEK, then it passes it to the daemon registered for the current->euid via a netlink unicast to the PID of the daemon, which was registered at the time the daemon was started by the user. The daemon actually just makes calls to libecryptfs, which implements request packet parsing and manages PKI modules. libecryptfs grabs the public key authentication token for the given signature from the user session keyring. This auth tok tells libecryptfs which PKI module should receive the request. libecryptfs then makes a decrypt() call to the PKI module, and it passes along the PKI block from the auth tok. The PKI uses the blob to figure out how it should decrypt the data passed to it; it performs the decryption and passes the decrypted data back to libecryptfs. libecryptfs then puts together a reply packet with the decrypted FEK and passes that back to the eCryptfs module. The eCryptfs module manages these request callouts to userspace code via message context structs. The module maintains an array of message context structs and places the elements of the array on two lists: a free and an allocated list. When eCryptfs wants to make a request, it moves a msg ctx from the free list to the allocated list, sets its state to pending, and fires off the message to the user's registered daemon. When eCryptfs receives a netlink message (via the callback), it correlates the msg ctx struct in the alloc list with the data in the message itself. The msg->index contains the offset of the array of msg ctx structs. It verifies that the registered daemon PID is the same as the PID of the process that sent the message. It also validates a sequence number between the received packet and the msg ctx. Then, it copies the contents of the message (the reply packet) into the msg ctx struct, sets the state in the msg ctx to done, and wakes up the process that was sleeping while waiting for the reply. The sleeping process was whatever was performing the sys_open(). This process originally called ecryptfs_send_message(); it is now in ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx state was set to done, it returns a pointer to the message contents (the reply packet) and returns. If all went well, this packet contains the decrypted FEK, which is then copied into the crypt_stat struct, and life continues as normal. The case for creation of a new file is very similar, only instead of a decrypt request, eCryptfs sends out an encrypt request. > - We have a great clod of key mangement code in-kernel. Why is that > not suitable (or growable) for public key management? eCryptfs uses Howells' keyring to store persistent key data and PKI state information. It defers public key cryptographic transformations to userspace code. The userspace data manipulation request really is orthogonal to key management in and of itself. What eCryptfs basically needs is a secure way to communicate with a particular daemon for a particular task doing a syscall, based on the UID. Nothing running under another UID should be able to access that channel of communication. > - Is it appropriate that new infrastructure for public key > management be private to a particular fs? The messaging.c file contains a lot of code that, perhaps, could be extracted into a separate kernel service. In essence, this would be a sort of request/reply mechanism that would involve a userspace daemon. I am not aware of anything that does quite what eCryptfs does, so I was not aware of any existing tools to do just what we wanted. > What happens if one of these daemons exits without sending a quit > message? There is a stale uid<->pid association in the hash table for that user. When the user registers a new daemon, eCryptfs cleans up the old association and generates a new one. See ecryptfs_process_helo(). > - _why_ does it use netlink? Netlink provides the transport mechanism that would minimize the complexity of the implementation, given that we can have multiple daemons (one per user). I explored the possibility of using relayfs, but that would involve having to introduce control channels and a protocol for creating and tearing down channels for the daemons. We do not have to worry about any of that with netlink. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().	Robert P. J. Day	2007-02-11	5	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>