op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	file capabilities: don't prevent signaling setuid root programs	Serge E. Hallyn	2007-11-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An unprivileged process must be able to kill a setuid root program started by the same user. This is legacy behavior needed for instance for xinit to kill X when the window manager exits. When an unprivileged user runs a setuid root program in !SECURE_NOROOT mode, fP, fI, and fE are set full on, so pP' and pE' are full on. Then cap_task_kill() prevents the user from signaling the setuid root task. This is a change in behavior compared to when !CONFIG_SECURITY_FILE_CAPABILITIES. This patch introduces a special check into cap_task_kill() just to check whether a non-root user is signaling a setuid root program started by the same user. If so, then signal is allowed. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	file capabilities: allow sigcont within session	Serge E. Hallyn	2007-11-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix http://bugzilla.kernel.org/show_bug.cgi?id=9247 Allow sigcont to be sent to a process with greater capabilities if it is in the same session. Otherwise, a shell from which I've started a root shell and done 'suspend' can't be restarted by the parent shell. Also don't do file-capabilities signaling checks when uids for the processes don't match, since the standard check_kill_permission will have done those checks. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Andrew Morgan <morgan@kernel.org> Cc: Chris Wright <chrisw@sous-sol.org> Tested-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	SELinux: add more validity checks on policy load	Stephen Smalley	2007-11-08	7	-38/+118
\| \| \| \| \| \| \| \| \| \|	Add more validity checks at policy load time to reject malformed policies and prevent subsequent out-of-range indexing when in permissive mode. Resolves the NULL pointer dereference reported in https://bugzilla.redhat.com/show_bug.cgi?id=357541. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: fix bug in new ebitmap code.	KaiGai Kohei	2007-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	The "e_iter = e_iter->next;" statement in the inner for loop is primally bug. It should be moved to outside of the for loop. Signed-off-by: KaiGai Kohei <kaigai@kaigai.gr.jp> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: suppress a warning for 64k pages.	Stephen Rothwell	2007-11-08	1	-6/+7
\| \| \| \| \| \| \| \| \|	On PowerPC allmodconfig build we get this: security/selinux/xfrm.c:214: warning: comparison is always false due to limited range of data type Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: James Morris <jmorris@namei.org>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-10-23	1	-5/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: always check SIGCHLD in selinux_task_wait
\| *	SELinux: always check SIGCHLD in selinux_task_wait	Eric Paris	2007-10-23	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When checking if we can wait on a child we were looking at p->exit_signal and trying to make the decision based on if the signal would eventually be allowed. One big flaw is that p->exit_signal is -1 for NPTL threads and so aignal_to_av was not actually checking SIGCHLD which is what would have been sent. Even is exit_signal was set to something strange it wouldn't change the fact that the child was there and needed to be waited on. This patch just assumes wait is based on SIGCHLD. Specific permission checks are made when the child actually attempts to send a signal. This resolves the problem of things like using GDB on confined domains such as in RH BZ 232371. The confined domain did not have permission to send a generic signal (exit_signal == -1) back to the unconfined GDB. With this patch the GDB wait works and since the actual signal sent is allowed everything functions as it should. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* \|	capabilities: clean up file capability reading	Serge E. Hallyn	2007-10-22	1	-8/+15
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the vfs_cap_data structure. Also fix get_file_caps which was declaring __le32 v1caps[XATTR_CAPS_SZ] on the stack, but XATTR_CAPS_SZ is already * sizeof(__le32). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	pid namespaces: define is_global_init() and is_container_init()	Serge E. Hallyn	2007-10-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sparse pointer use of zero as null	Stephen Hemminger	2007-10-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of sparse related warnings from places that use integer as NULL pointer. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Jeff Garzik <jeff@garzik.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Ian Kent <raven@themaw.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	V3 file capabilities: alter behavior of cap_setpcap	Andrew Morgan	2007-10-18	2	-14/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) \| (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	security/ cleanups	Adrian Bunk	2007-10-17	5	-118/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains the following cleanups that are now possible: - remove the unused security_operations->inode_xattr_getsuffix - remove the no longer used security_operations->unregister_security - remove some no longer required exit code - remove a bunch of no longer used exports Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Implement file posix capabilities	Serge E. Hallyn	2007-10-17	6	-43/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	security: Convert LSM into a static interface	James Morris	2007-10-17	8	-71/+961
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert LSM into a static interface, as the ability to unload a security module is not required by in-tree users and potentially complicates the overall security architecture. Needlessly exported LSM symbols have been unexported, to help reduce API abuse. Parameters for the capability and root_plug modules are now specified at boot. The SECURITY_FRAMEWORK_VERSION macro has also been removed. In a nutshell, there is no safe way to unload an LSM. The modular interface is thus unecessary and broken infrastructure. It is used only by out-of-tree modules, which are often binary-only, illegal, abusive of the API and dangerous, e.g. silently re-vectoring SELinux. [akpm@linux-foundation.org: cleanups] [akpm@linux-foundation.org: USB Kconfig fix] [randy.dunlap@oracle.com: fix LSM kernel-doc] Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	KEYS: Make request_key() and co fundamentally asynchronous	David Howells	2007-10-17	5	-317/+335
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make request_key() and co fundamentally asynchronous to make it easier for NFS to make use of them. There are now accessor functions that do asynchronous constructions, a wait function to wait for construction to complete, and a completion function for the key type to indicate completion of construction. Note that the construction queue is now gone. Instead, keys under construction are linked in to the appropriate keyring in advance, and that anyone encountering one must wait for it to be complete before they can use it. This is done automatically for userspace. The following auxiliary changes are also made: (1) Key type implementation stuff is split from linux/key.h into linux/key-type.h. (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does not need to call key_instantiate_and_link() directly. (3) Adjust the debugging macros so that they're -Wformat checked even if they are disabled, and make it so they can be enabled simply by defining __KDEBUG to be consistent with other code of mine. (3) Documentation. [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	SELinux: kills warnings in Improve SELinux performance when AVC misses	KaiGai Kohei	2007-10-17	2	-6/+7
\| \| \| \| \| \| \| \|	This patch kills ugly warnings when the "Improve SELinux performance when ACV misses" patch. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: improve performance when AVC misses.	KaiGai Kohei	2007-10-17	4	-237/+303
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* We add ebitmap_for_each_positive_bit() which enables to walk on any positive bit on the given ebitmap, to improve its performance using common bit-operations defined in linux/bitops.h. In the previous version, this logic was implemented using a combination of ebitmap_for_each_bit() and ebitmap_node_get_bit(), but is was worse in performance aspect. This logic is most frequestly used to compute a new AVC entry, so this patch can improve SELinux performance when AVC misses are happen. * struct ebitmap_node is redefined as an array of "unsigned long", to get suitable for using find_next_bit() which is fasted than iteration of shift and logical operation, and to maximize memory usage allocated from general purpose slab. * Any ebitmap_for_each_bit() are repleced by the new implementation in ss/service.c and ss/mls.c. Some of related implementation are changed, however, there is no incompatibility with the previous version. * The width of any new line are less or equal than 80-chars. The following benchmark shows the effect of this patch, when we access many files which have different security context one after another. The number is more than /selinux/avc/cache_threshold, so any access always causes AVC misses. selinux-2.6 selinux-2.6-ebitmap AVG: 22.763 [s] 8.750 [s] STD: 0.265 0.019 ------------------------------------------ 1st: 22.558 [s] 8.786 [s] 2nd: 22.458 [s] 8.750 [s] 3rd: 22.478 [s] 8.754 [s] 4th: 22.724 [s] 8.745 [s] 5th: 22.918 [s] 8.748 [s] 6th: 22.905 [s] 8.764 [s] 7th: 23.238 [s] 8.726 [s] 8th: 22.822 [s] 8.729 [s] Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: policy selectable handling of unknown classes and perms	Eric Paris	2007-10-17	5	-9/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow policy to select, in much the same way as it selects MLS support, how the kernel should handle access decisions which contain either unknown classes or unknown permissions in known classes. The three choices for the policy flags are 0 - Deny unknown security access. (default) 2 - reject loading policy if it does not contain all definitions 4 - allow unknown security access The policy's choice is exported through 2 booleans in selinuxfs. /selinux/deny_unknown and /selinux/reject_unknown. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: Improve read/write performance	Yuichi Nakamura	2007-10-17	5	-1/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It reduces the selinux overhead on read/write by only revalidating permissions in selinux_file_permission if the task or inode labels have changed or the policy has changed since the open-time check. A new LSM hook, security_dentry_open, is added to capture the necessary state at open time to allow this optimization. (see http://marc.info/?l=selinux&m=118972995207740&w=2) Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: tune avtab to reduce memory usage	Yuichi Nakamura	2007-10-17	4	-36/+82
\| \| \| \| \| \| \| \| \| \| \| \|	This patch reduces memory usage of SELinux by tuning avtab. Number of hash slots in avtab was 32768. Unused slots used memory when number of rules is fewer. This patch decides number of hash slots dynamically based on number of rules. (chain length)^2 is also printed out in avtab_hash_eval to see standard deviation of avtab hash table. Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	[SELINUX]: Update for netfilter ->hook() arg changes.	David S. Miller	2007-10-15	1	-6/+5
\| \| \| \| \| \|	They take a "struct sk_buff " instead of a "struct sk_buff *" now. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[INET]: local port range robustness	Stephen Hemminger	2007-10-10	1	-17/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Support multiple network namespaces with netlink	Eric W. Biederman	2007-10-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Make device event notification network namespace safe	Eric W. Biederman	2007-10-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	SELinux: fix array out of bounds when mounting with selinux options	Eric Paris	2007-09-20	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Given an illegal selinux option it was possible for match_token to work in random memory at the end of the match_table_t array. Note that privilege is required to perform a context mount, so this issue is effectively limited to root only. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: clear parent death signal on SID transitions	Stephen Smalley	2007-08-30	1	-0/+3
\| \| \| \| \| \| \| \| \|	Clear parent death signal on SID transitions to prevent unauthorized signaling between SIDs. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <jmorris@localhost.localdomain>
*	fix NULL pointer dereference in __vm_enough_memory()	Alan Cox	2007-08-22	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new exec code inserts an accounted vma into an mm struct which is not current->mm. The existing memory check code has a hard coded assumption that this does not happen as does the security code. As the correct mm is known we pass the mm to the security method and the helper function. A new security test is added for the case where we need to pass the mm and the existing one is modified to pass current->mm to avoid the need to change large amounts of code. (Thanks to Tobias for fixing rejects and testing) Signed-off-by: Alan Cox <alan@redhat.com> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Cc: James Morris <jmorris@redhat.com> Cc: Tobias Diedrich <ranma+kernel@tdiedrich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	SELinux: correct error code in selinux_audit_rule_init	Steve G	2007-08-16	1	-1/+1
\| \| \| \| \| \| \|	Corrects an error code so that it is valid to pass to userspace. Signed-off-by: Steve Grubb <linux_4ever@yahoo.com> Signed-off-by: James Morris <jmorris@halo.namei>
*	SELinux: remove redundant pointer checks before calling kfree()	Paul Moore	2007-08-02	1	-2/+1
\| \| \| \| \| \| \| \|	We don't need to check for NULL pointers before calling kfree(). Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: restore proper NetLabel caching behavior	Paul Moore	2007-08-02	1	-4/+12
\| \| \| \| \| \| \| \| \| \|	A small fix to the SELinux/NetLabel glue code to ensure that the NetLabel cache is utilized when possible. This was broken when the SELinux/NetLabel glue code was reorganized in the last kernel release. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	Typo fixes errror -> error	Gabriel Craciunescu	2007-07-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Typo fixes errror -> error Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	SELinux: null-terminate context string in selinux_xfrm_sec_ctx_alloc	Venkat Yekkirala	2007-07-25	1	-1/+2
\| \| \| \| \| \| \| \| \|	xfrm_audit_log() expects the context string to be null-terminated which currently doesn't happen with user-supplied contexts. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: fix memory leak in security_netlbl_cache_add()	Jesper Juhl	2007-07-23	1	-1/+3
\| \| \| \| \| \| \| \| \|	Fix memory leak in security_netlbl_cache_add() Note: The Coverity checker gets credit for spotting this one. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	[PATCH] get rid of AVC_PATH postponed treatment	Al Viro	2007-07-22	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Selinux folks had been complaining about the lack of AVC_PATH records when audit is disabled. I must admit my stupidity - I assumed that avc_audit() really couldn't use audit_log_d_path() because of deadlocks (== could be called with dcache_lock or vfsmount_lock held). Shouldn't have made that assumption - it never gets called that way. It _is_ called under spinlocks, but not those. Since audit_log_d_path() uses ab->gfp_mask for allocations, kmalloc() in there is not a problem. IOW, the simple fix is sufficient: let's rip AUDIT_AVC_PATH out and simply generate pathname as part of main record. It's trivial to do. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: James Morris <jmorris@namei.org>
*	mm: Remove slab destructors from kmem_cache_create().	Paul Mundt	2007-07-20	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-07-19	2	-31/+39
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6: SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement
\| *	SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel	Paul Moore	2007-07-19	2	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These changes will make NetLabel behave like labeled IPsec where there is an access check for both labeled and unlabeled packets as well as providing the ability to restrict domains to receiving only labeled packets when NetLabel is in use. The changes to the policy are straight forward with the following necessary to receive labeled traffic (with SECINITSID_NETMSG defined as "netlabel_peer_t"): allow mydom_t netlabel_peer_t:{ tcp_socket udp_socket rawip_socket } recvfrom; The policy for unlabeled traffic would be: allow mydom_t unlabeled_t:{ tcp_socket udp_socket rawip_socket } recvfrom; These policy changes, as well as more general NetLabel support, are included in the latest SELinux Reference Policy release 20070629 or later. Users who make use of NetLabel are strongly encouraged to upgrade their policy to avoid network problems. Users who do not make use of NetLabel will not notice any difference. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
\| *	SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement	Paul Moore	2007-07-19	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a new NetLabel KAPI interface, netlbl_enabled(), which reports on the current runtime status of NetLabel based on the existing configuration. LSMs that make use of NetLabel, i.e. SELinux, can use this new function to determine if they should perform NetLabel access checks. This patch changes the NetLabel/SELinux glue code such that SELinux only enforces NetLabel related access checks when netlbl_enabled() returns true. At present NetLabel is considered to be enabled when there is at least one labeled protocol configuration present. The result is that by default NetLabel is considered to be disabled, however, as soon as an administrator configured a CIPSO DOI definition NetLabel is enabled and SELinux starts enforcing NetLabel related access controls - including unlabeled packet controls. This patch also tries to consolidate the multiple "#ifdef CONFIG_NETLABEL" blocks into a single block to ease future review as recommended by Linus. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
* \|	coredump masking: reimplementation of dumpable using two flags	Kawai, Hidehiro	2007-07-19	2	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes mm_struct.dumpable to a pair of bit flags. set_dumpable() converts three-value dumpable to two flags and stores it into lower two bits of mm_struct.flags instead of mm_struct.dumpable. get_dumpable() behaves in the opposite way. [akpm@linux-foundation.org: export set_dumpable] Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	usermodehelper: Tidy up waiting	Jeremy Fitzhardinge	2007-07-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than using a tri-state integer for the wait flag in call_usermodehelper_exec, define a proper enum, and use that. I've preserved the integer values so that any callers I've missed should still work OK. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Howells <dhowells@redhat.com>
*	Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check	Satyam Sharma	2007-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Audit: add TTY input auditing	Miloslav Trmac	2007-07-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add TTY input auditing, used to audit system administrator's actions. This is required by various security standards such as DCID 6/3 and PCI to provide non-repudiation of administrator's actions and to allow a review of past actions if the administrator seems to overstep their duties or if the system becomes misconfigured for unknown reasons. These requirements do not make it necessary to audit TTY output as well. Compared to an user-space keylogger, this approach records TTY input using the audit subsystem, correlated with other audit events, and it is completely transparent to the user-space application (e.g. the console ioctls still work). TTY input auditing works on a higher level than auditing all system calls within the session, which would produce an overwhelming amount of mostly useless audit events. Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs by process with the attribute is sent to the audit subsystem by the kernel. The audit netlink interface is extended to allow modifying the audit_tty attribute, and to allow sending explanatory audit events from user-space (for example, a shell might send an event containing the final command, after the interactive command-line editing and history expansion is performed, which might be difficult to decipher from the TTY input alone). Because the "audit_tty" attribute is inherited across fork (), it would be set e.g. for sshd restarted within an audited session. To prevent this, the audit_tty attribute is cleared when a process with no open TTY file descriptors (e.g. after daemon startup) opens a TTY. See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a more detailed rationale document for an older version of this patch. [akpm@linux-foundation.org: build fix] Signed-off-by: Miloslav Trmac <mitr@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Revert "SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for ↵	Linus Torvalds	2007-07-13	2	-24/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NetLabel" This reverts commit 9faf65fb6ee2b4e08325ba2d69e5ccf0c46453d0. It bit people like Michal Piotrowski: "My system is too secure, I can not login :)" because it changed how CONFIG_NETLABEL worked, and broke older SElinux policies. As a result, quoth James Morris: "Can you please revert this patch? We thought it only affected people running MLS, but it will affect others. Sorry for the hassle." Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Paul Moore <paul.moore@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	security: unexport mmap_min_addr	Adrian Bunk	2007-07-11	1	-1/+0
\| \| \| \| \| \| \|	Remove unneeded export. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel	Paul Moore	2007-07-11	2	-31/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These changes will make NetLabel behave like labeled IPsec where there is an access check for both labeled and unlabeled packets as well as providing the ability to restrict domains to receiving only labeled packets when NetLabel is in use. The changes to the policy are straight forward with the following necessary to receive labeled traffic (with SECINITSID_NETMSG defined as "netlabel_peer_t"): allow mydom_t netlabel_peer_t:{ tcp_socket udp_socket rawip_socket } recvfrom; The policy for unlabeled traffic would be: allow mydom_t unlabeled_t:{ tcp_socket udp_socket rawip_socket } recvfrom; These policy changes, as well as more general NetLabel support, are included in the SELinux Reference Policy SVN tree, r2352 or later. Users who enable NetLabel support in the kernel are strongly encouraged to upgrade their policy to avoid network problems. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
*	security: Protection for exploiting null dereference using mmap	Eric Paris	2007-07-11	7	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new security check on mmap operations to see if the user is attempting to mmap to low area of the address space. The amount of space protected is indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to 0, preserving existing behavior. This patch uses a new SELinux security class "memprotect." Policy already contains a number of allow rules like a_t self:process * (unconfined_t being one of them) which mean that putting this check in the process class (its best current fit) would make it useless as all user processes, which we also want to protect against, would be allowed. By taking the memprotect name of the new class it will also make it possible for us to move some of the other memory protect permissions out of 'process' and into the new class next time we bump the policy version number (which I also think is a good future idea) Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: Use %lu for inode->i_no when printing avc	Tobias Oed	2007-07-11	1	-1/+1
\| \| \| \| \| \| \|	Inode numbers are unsigned long and so need to %lu as format string of printf. Signed-off-by: Tobias Oed <tobias.oed@octant-fr.com> Signed-off-by: James Morris <jmorris@namei.org>
*	SELinux: allow preemption between transition permission checks	Stephen Smalley	2007-07-11	4	-29/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In security_get_user_sids, move the transition permission checks outside of the section holding the policy rdlock, and use the AVC to perform the checks, calling cond_resched after each one. These changes should allow preemption between the individual checks and enable caching of the results. It may however increase the overall time spent in the function in some cases, particularly in the cache miss case. The long term fix will be to take much of this logic to userspace by exporting additional state via selinuxfs, and ultimately deprecating and eliminating this interface from the kernel. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
*	selinux: introduce schedule points in policydb_destroy()	Eric Paris	2007-07-11	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the LSPP testing we found that it was possible for policydb_destroy() to take 10+ seconds of kernel time to complete. Basically all policydb_destroy() does is walk some (possibly long) lists and free the memory it finds. Turning off slab debugging config options made the problem go away since the actual functions which took most of the time were (as seen by oprofile) > 121202 23.9879 .check_poison_obj > 78247 15.4864 .check_slabp were caused by that. So I decided to also add some voluntary schedule points in that code so config voluntary preempt would be enough to solve the problem. Something similar was done in places like shmem_free_pages() when we have to walk a list of memory and free it. This was tested by the LSPP group on the hardware which could reproduce the problem just loading a new policy and was found to not trigger the softlock detector. It takes just as much processing time, but the kernel doesn't spend all that time stuck doing one thing and never scheduling. Someday a better way to handle memory might make the time needed in this function a lot less, but this fixes the current issue as it stands today. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
*	selinux: add selinuxfs structure for object class discovery	Christopher J. PeBenito	2007-07-11	2	-0/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The structure is as follows (relative to selinuxfs root): /class/file/index /class/file/perms/read /class/file/perms/write ... Each class is allocated 33 inodes, 1 for the class index and 32 for permissions. Relative to SEL_CLASS_INO_OFFSET, the inode of the index file DIV 33 is the class number. The inode of the permission file % 33 is the index of the permission for that class. Signed-off-by: Christopher J. PeBenito <cpebenito@tresys.com> Signed-off-by: James Morris <jmorris@namei.org>