op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	vfs: add helpers to get root and pwd	Miklos Szeredi	2010-08-11	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Add three helpers that retrieve a refcounted copy of the root and cwd from the supplied fs_struct. get_fs_root() get_fs_pwd() get_fs_root_and_pwd() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fsnotify: rename fsnotify_mark_entry to just fsnotify_mark	Eric Paris	2010-07-28	1	-2/+2
\| \| \| \| \| \| \|	The name is long and it serves no real purpose. So rename fsnotify_mark_entry to just fsnotify_mark. Signed-off-by: Eric Paris <eparis@redhat.com>
*	inotify: remove inotify in kernel interface	Eric Paris	2010-07-28	1	-1/+0
\| \| \| \| \| \|	nothing uses inotify in the kernel, drop it! Signed-off-by: Eric Paris <eparis@redhat.com>
*	audit: reimplement audit_trees using fsnotify rather than inotify	Eric Paris	2010-07-28	1	-2/+2
\| \| \| \| \| \| \|	Simply switch audit_trees from using inotify to using fsnotify for it's inode pinning and disappearing act information. Signed-off-by: Eric Paris <eparis@redhat.com>
*	Audit: clean up the audit_watch split	Eric Paris	2010-07-28	1	-3/+2
\| \| \| \| \| \| \| \|	No real changes, just cleanup to the audit_watch split patch which we done with minimal code changes for easy review. Now fix interfaces to make things work better. Signed-off-by: Eric Paris <eparis@redhat.com>
*	audit: preface audit printk with audit	Eric Paris	2010-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There have been a number of reports of people seeing the message: "name_count maxed, losing inode data: dev=00:05, inode=3185" in dmesg. These usually lead to people reporting problems to the filesystem group who are in turn clueless what they mean. Eventually someone finds me and I explain what is going on and that these come from the audit system. The basics of the problem is that the audit subsystem never expects a single syscall to 'interact' (for some wish washy meaning of interact) with more than 20 inodes. But in fact some operations like loading kernel modules can cause changes to lots of inodes in debugfs. There are a couple real fixes being bandied about including removing the fixed compile time limit of 20 or not auditing changes in debugfs (or both) but neither are small and obvious so I am not sending them for immediate inclusion (I hope Al forwards a real solution next devel window). In the meantime this patch simply adds 'audit' to the beginning of the crap message so if a user sees it, they come blame me first and we can talk about what it means and make sure we understand all of the reasons it can happen and make sure this gets solved correctly in the long run. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵	Tejun Heo	2010-03-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
*	Lose the first argument of audit_inode_child()	Al Viro	2010-02-08	1	-5/+2
\| \| \| \| \| \|	it's always equal to ->d_name.name of the second argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Sanitize f_flags helpers	Al Viro	2009-12-22	1	-1/+0
\| \| \| \| \| \| \| \|	* pull ACC_MODE to fs.h; we have several copies all over the place * nightmarish expression calculating f_mode by f_flags deserves a helper too (OPEN_FMODE(flags)) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Audit: rearrange audit_context to save 16 bytes per struct	Eric Paris	2009-09-24	1	-3/+3
\| \| \| \| \| \| \| \| \|	pahole pointed out that on x86_64 struct audit_context can be rearrainged to save 16 bytes per struct. Since we have an audit_context per task this can acually be a pretty significant gain. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Fix rule eviction order for AUDIT_DIR	Al Viro	2009-06-24	1	-0/+15
\| \| \| \| \| \| \| \| \|	If syscall removes the root of subtree being watched, we definitely do not want the rules refering that subtree to be destroyed without the syscall in question having a chance to match them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Audit: clean up all op= output to include string quoting	Eric Paris	2009-06-24	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A number of places in the audit system we send an op= followed by a string that includes spaces. Somehow this works but it's just wrong. This patch moves all of those that I could find to be quoted. Example: Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule key="number2" list=4 res=0 Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule" key="number2" list=4 res=0 Signed-off-by: Eric Paris <eparis@redhat.com>
*	audit: seperate audit inode watches into a subfile	Eric Paris	2009-06-23	1	-3/+3
\| \| \| \| \| \| \| \|	In preparation for converting audit to use fsnotify instead of inotify we seperate the inode watching code into it's own file. This is similar to how the audit tree watching code is already seperated into audit_tree.c Signed-off-by: Eric Paris <eparis@redhat.com>
*	Audit: better estimation of execve record length	Eric Paris	2009-06-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The audit execve record splitting code estimates the length of the message generated. But it forgot to include the "" that wrap each string in its estimation. This means that execve messages with lots of tiny (1-2 byte) arguments could still cause records greater than 8k to be emitted. Simply fix the estimate. Signed-off-by: Eric Paris <eparis@redhat.com>
*	Audit: remove spaces from audit_log_d_path	Eric Paris	2009-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	audit_log_d_path had spaces in the strings which would be emitted on the error paths. This patch simply replaces those spaces with an _ or removes the needless spaces entirely. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	audit: audit_set_auditable defined but not used	Eric Paris	2009-04-05	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	after 0590b9335a1c72a3f0defcc6231287f7817e07c8 audit_set_auditable() is now only used by the audit tree code. If CONFIG_AUDIT_TREE is unset it will be defined but unused. This patch simply moves the function inside a CONFIG_AUDIT_TREE block. cc1: warnings being treated as errors /home/acme_unencrypted/git/linux-2.6-tip/kernel/auditsc.c:745: error: ‘audit_set_auditable’ defined but not used make[2]: * [kernel/auditsc.o] Error 1 make[1]: * [kernel] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	audit: Fix possible return value truncation in audit_get_context()	Paul Moore	2009-04-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The audit subsystem treats syscall return codes as type long, unfortunately the audit_get_context() function mistakenly converts the return code to an int type in the parameters which could cause problems on systems where the sizeof(int) != sizeof(long). Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	auditsc: fix kernel-doc notation	Randy Dunlap	2009-04-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix auditsc kernel-doc notation: Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): No description found for parameter 'attr' Warning(linux-2.6.28-git7//kernel/auditsc.c:2156): Excess function parameter 'u_attr' description in '__audit_mq_open' Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): No description found for parameter 'notification' Warning(linux-2.6.28-git7//kernel/auditsc.c:2204): Excess function parameter 'u_notification' description in '__audit_mq_notify' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	audit: EXECVE record - removed bogus newline	Jiri Pirko	2009-04-05	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(updated) Added hunk that changes the comment, the rest is the same. EXECVE records contain a newline after every argument. auditd converts "\n" to " " so you cannot see newlines even in raw logs, but they're there nevertheless. If you're not using auditd, you need to work round them. These '\n' chars are can be easily replaced by spaces when creating record in kernel. Note there is no need for trailing '\n' in an audit record. record before this patch: "type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\"\na1=\"a\"\na2=\"b\"\na3=\"c\"\n" record after this patch: "type=EXECVE msg=audit(1231421801.566:31): argc=4 a0=\"./test\" a1=\"a\" a2=\"b\" a3=\"c\"" Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Get rid of indirect include of fs_struct.h	Al Viro	2009-03-31	1	-0/+1
\| \| \| \| \| \| \| \|	Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	make sure that filterkey of task,always rules is reported	Al Viro	2009-01-04	1	-4/+11
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	fixing audit rule ordering mess, part 1	Al Viro	2009-01-04	1	-36/+43
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: ordering between the rules on exit chain is currently lost; all watch and inode rules are listed after everything else _and_ exit,never on one kind doesn't stop exit,always on another from being matched. Solution: assign priorities to rules, keep track of the current highest-priority matching rule and its result (always/never). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_log_capset()	Al Viro	2009-01-04	1	-28/+16
\| \| \| \| \| \| \| \|	* no allocations * return void * don't duplicate checked for dummy context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_fd_pair()	Al Viro	2009-01-04	1	-30/+14
\| \| \| \| \| \| \|	* no allocations * return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_mq_open()	Al Viro	2009-01-04	1	-42/+23
\| \| \| \| \| \| \| \|	* don't bother with allocations * don't do double copy_from_user() * don't duplicate parts of check for audit_dummy_context() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize AUDIT_MQ_SENDRECV	Al Viro	2009-01-04	1	-98/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	* logging the original value of msg_prio in mq_timedreceive(2) is insane - the argument is write-only (i.e. syscall always ignores the original value and only overwrites it). merge __audit_mq_timed{send,receive} * don't do copy_from_user() twice * don't mess with allocations in auditsc part * ... and don't bother checking !audit_enabled and !context in there - we'd already checked for audit_dummy_context(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_mq_notify()	Al Viro	2009-01-04	1	-40/+16
\| \| \| \| \| \| \| \| \|	* don't copy_from_user() twice * don't bother with allocations * don't duplicate parts of audit_dummy_context() * make it return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_mq_getsetattr()	Al Viro	2009-01-04	1	-37/+17
\| \| \| \| \| \| \| \|	* get rid of allocations * make it return void * don't duplicate parts of audit_dummy_context() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_ipc_set_perm()	Al Viro	2009-01-04	1	-33/+26
\| \| \| \| \| \| \| \|	* get rid of allocations * make it return void * simplify callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_ipc_obj()	Al Viro	2009-01-04	1	-51/+37
\| \| \| \| \| \| \| \|	* get rid of allocations * make it return void * simplify callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	sanitize audit_socketcall	Al Viro	2009-01-04	1	-28/+38
\| \| \| \| \| \| \|	* don't bother with allocations * now that it can't fail, make it return void Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	don't reallocate buffer in every audit_sockaddr()	Al Viro	2009-01-04	1	-24/+22
\| \| \| \| \| \| \|	No need to do that more than once per process lifetime; allocating/freeing on each sendto/accept/etc. is bloody pointless. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Merge branch 'next' into for-linus	James Morris	2008-12-25	1	-29/+226
\|\
\| *	CRED: Inaugurate COW credentials	David Howells	2008-11-14	1	-21/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: () security_capset_check(), ->capset_check() () security_capset_set(), ->capset_set() Removed in favour of security_capset(). () security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. () security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. () security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). () security_cred_free(), ->cred_free() New. Free security data attached to cred->security. () security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. () security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). () security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). () security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). () security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. () security_key_alloc(), ->key_alloc() () security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
\| *	CRED: Use RCU to access another task's creds and to release a task's own creds	David Howells	2008-11-14	1	-14/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use RCU to access another task's creds and to release a task's own creds. This means that it will be possible for the credentials of a task to be replaced without another task (a) requiring a full lock to read them, and (b) seeing deallocated memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
\| *	CRED: Separate task security context from task_struct	David Howells	2008-11-14	1	-25/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
\| *	CRED: Wrap task credential accesses in the core kernel	David Howells	2008-11-14	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-audit@redhat.com Cc: containers@lists.linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: James Morris <jmorris@namei.org>
\| *	When the capset syscall is used it is not possible for audit to record the	Eric Paris	2008-11-11	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	actual capbilities being added/removed. This patch adds a new record type which emits the target pid and the eff, inh, and perm cap sets. example output if you audit capset syscalls would be: type=SYSCALL msg=audit(1225743140.465:76): arch=c000003e syscall=126 success=yes exit=0 a0=17f2014 a1=17f201c a2=80000000 a3=7fff2ab7f060 items=0 ppid=2160 pid=2223 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="setcap" exe="/usr/sbin/setcap" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1322] msg=audit(1225743140.465:76): pid=0 cap_pi=ffffffffffffffff cap_pp=ffffffffffffffff cap_pe=ffffffffffffffff Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
\| *	Any time fcaps or a setuid app under SECURE_NOROOT is used to result in a	Eric Paris	2008-11-11	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-zero pE we will crate a new audit record which contains the entire set of known information about the executable in question, fP, fI, fE, fversion and includes the process's pE, pI, pP. Before and after the bprm capability are applied. This record type will only be emitted from execve syscalls. an example of making ping use fcaps instead of setuid: setcap "cat_net_raw+pe" /bin/ping type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000 type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1" type=CWD msg=audit(1225742021.015:236): cwd="/home/test" type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2 type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
\| *	This patch will print cap_permitted and cap_inheritable data in the PATH	Eric Paris	2008-11-11	1	-5/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	records of any file that has file capabilities set. Files which do not have fcaps set will not have different PATH records. An example audit record if you run: setcap "cap_net_admin+pie" /bin/bash /bin/bash type=SYSCALL msg=audit(1225741937.363:230): arch=c000003e syscall=59 success=yes exit=0 a0=2119230 a1=210da30 a2=20ee290 a3=8 items=2 ppid=2149 pid=2923 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=EXECVE msg=audit(1225741937.363:230): argc=2 a0="ping" a1="www.google.com" type=CWD msg=audit(1225741937.363:230): cwd="/root" type=PATH msg=audit(1225741937.363:230): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0104755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fi=0000000000002000 cap_fe=1 cap_fver=2 type=PATH msg=audit(1225741937.363:230): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
* \|	[PATCH] fix broken timestamps in AVC generated by kernel threads	Al Viro	2008-12-09	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Timestamp in audit_context is valid only if ->in_syscall is set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	[patch 1/1] audit: remove excess kernel-doc	Randy Dunlap	2008-12-09	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Delete excess kernel-doc notation in kernel/auditsc.c: Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry' Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	[PATCH] return records for fork() both to child and parent	Al Viro	2008-12-09	1	-0/+17
\|/ \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	tty: Fix abusers of current->sighand->tty	Alan Cox	2008-10-13	1	-5/+4
\| \| \| \| \| \| \| \|	Various people outside the tty layer still stick their noses in behind the scenes. We need to make sure they also obey the locking and referencing rules. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] audit: Moved variable declaration to beginning of function	Cordelia	2008-09-01	1	-1/+2
\| \| \| \| \| \| \| \|	got rid of compilation warning: ISO C90 forbids mixed declarations and code Signed-off-by: Cordelia Sam <cordesam@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	Re: [PATCH] Fix the kernel panic of audit_filter_task when key field is set	zhangxiliang	2008-08-04	1	-0/+7
\| \| \| \| \| \| \| \| \|	Sorry, I miss a blank between if and "(". And I add "unlikely" to check "ctx" in audit_match_perm() and audit_match_filetype(). This is a new patch for it. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] Fix the kernel panic of audit_filter_task when key field is set	zhangxiliang	2008-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	When calling audit_filter_task(), it calls audit_filter_rules() with audit_context is NULL. If the key field is set, the result in audit_filter_rules() will be set to 1 and ctx->filterkey will be set to key. But the ctx is NULL in this condition, so kernel will panic. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	[PATCH] Audit: Collect signal info when SIGUSR2 is sent to auditd	Eric Paris	2008-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Makes the kernel audit subsystem collect information about the sending process when that process sends SIGUSR2 to the userspace audit daemon. SIGUSR2 is a new interesting signal to auditd telling auditd that it should try to start logging to disk again and the error condition which caused it to stop logging to disk (usually out of space) has been rectified. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	x86_64 syscall audit fast-path	Roland McGrath	2008-07-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	This adds a fast path for 64-bit syscall entry and exit when TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing. This path does not need to save and restore all registers as the general case of tracing does. Avoiding the iret return path when syscall audit is enabled helps performance a lot. Signed-off-by: Roland McGrath <roland@redhat.com>
*	[PATCH] new predicate - AUDIT_FILETYPE	Al Viro	2008-04-28	1	-0/+16
\| \| \| \| \| \| \| \| \| \|	Argument is S_IF... \| <index>, where index is normally 0 or 1. Triggers if chosen element of ctx->names[] is present and the mode of object in question matches the upper bits of argument. I.e. for things like "is the argument of that chmod a directory", etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>