summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_jail.c
Commit message (Collapse)AuthorAgeFilesLines
* (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.antoine2009-12-281-1/+1
| | | | | | | | | Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
* Throughout the network stack we have a few places ofbz2009-12-131-1/+24
| | | | | | | | | | | | | | | | | | if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days
* Revert revision 199201 for now as it has introduced a kernel vulnerabilitydelphij2009-11-121-1/+0
| | | | and requires more polishing.
* Add interface description capability as inspired by OpenBSD.delphij2009-11-111-0/+1
| | | | MFC after: 3 months
* Revert previous commit and add myself to the list of people who shouldphk2009-09-081-1/+0
| | | | know better than to commit with a cat in the area.
* Add necessary include.phk2009-09-081-0/+1
|
* Allow a jail's name to be the same as its jid (which is the default if nojamie2009-09-041-9/+20
| | | | | | | | name is specified), but still disallow other numeric names. Reviewed by: zec Approved by: bz (mentor) MFC after: 3 days
* Fix a LOR between allprison_lock and vnode locks by releasingjamie2009-08-271-2/+2
| | | | | | | | | allprison_lock before releasing a prison's root vnode. PR: kern/138004 Reviewed by: kib Approved by: bz (mentor) MFC after: 3 days
* When "jail -c vnet" request fails, the current code actually creates andzec2009-08-241-1/+1
| | | | | | | | | | | | leaves behind an orphaned vnet. This change ensures that such vnets get released. This change affects only options VIMAGE builds. Submitted by: jamie Discussed with: bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days
* Make it possible to change the vnet sysctl variables on jailsbz2009-08-131-0/+23
| | | | | | | | | with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)
* Make the kernel compile without IP networking by movingbz2009-08-121-1/+2
| | | | | | a variable under a proper #ifdef. Approved by: re (rwatson)
* Merge the remainder of kern_vimage.c and vimage.h into vnet.c andrwatson2009-08-011-1/+4
| | | | | | | | | | vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
* Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2),jamie2009-07-311-3/+4
| | | | | | | | instead of whatever the parent/system has (which is generally 0). This mirrors the old-style default used for jail(2) in conjunction with the security.jail.enforce_statfs sysctl. Approved by: re (kib), bz (mentor)
* Remove a LOR, where the the sleepable allprison_lock was being obtainedjamie2009-07-301-309/+187
| | | | | | | | | | | | | | | | | | | | in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock can be avoided by making a restriction on the IP addresses associated with jails: Don't allow the "ip4" and "ip6" parameters to be changed after a jail is created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed, but only if the jail was already created with either ip4/6=new or ip4/6=disable. With this restriction, the prison flags in question (PR_IP4_USER and PR_IP6_USER) become read-only and can be checked without locking. This also allows the simplification of a messy code path that was needed to handle an existing prison gaining an IP address list. PR: kern/136899 Reported by: Dirk Meyer Approved by: re (kib), bz (mentor)
* Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnetjamie2009-07-291-11/+98
| | | | | | | | jails have their own IP stack and don't have access to the parent IP addresses anyway. Note that a virtual network stack forms a break between prisons with regard to the list of allowed IP addresses. Approved by: re (kib), bz (mentor)
* Change the default value of the "ip4" and "ip6" jail parameters tojamie2009-07-291-27/+7
| | | | | | | | | "disable", which only allows access to the parent/physical system's IP addresses when specifically directed. Change the default value of "host" to "new", and don't copy the parent host values, to insulate jails from the parent hostname et al. Approved by: re (kib), bz (mentor)
* Some jail parameters (in particular, "ip4" and "ip6" for IP addressjamie2009-07-251-35/+85
| | | | | | | | | restrictions) were found to be inadequately described by a boolean. Define a new parameter type with three values (disable, new, inherit) to handle these and future cases. Approved by: re (kib), bz (mentor) Discussed with: rwatson
* Remove the interim vimage containers, struct vimage and struct procg,jamie2009-07-171-4/+0
| | | | | | and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)
* Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies.jamie2009-06-241-0/+2
| | | | | | | bz wants the blame for this. Noticed by: rwatson Approved by: bz (mentor)
* In case of prisons with their own network stack, permitjamie2009-06-241-0/+128
| | | | | | | | | | | | additional privileges as well as not restricting the type of sockets a user can open. Note: the VIMAGE/vnet fetaure of of jails is still considered experimental and cannot guarantee that privileged users can be kept imprisoned if enabled. Reviewed by: rwatson Approved by: bz (mentor)
* Add a limit for child jails via the "children.cur" and "children.max"jamie2009-06-231-9/+50
| | | | | | parameters. This replaces the simple "allow.jails" permission. Approved by: bz (mentor)
* Manage vnets via the jail system. If a jail is given the booleanjamie2009-06-151-0/+29
| | | | | | | | | | | parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)
* Rename the host-related prison fields to be the same as the host.*jamie2009-06-131-29/+33
| | | | | | | parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)
* Add counterparts to getcredhostname:jamie2009-06-131-1/+32
| | | | | | | getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz
* Fix some overflow errors: a signed allocation and an insufficiantjamie2009-06-091-4/+12
| | | | | | | | array size. Reported by: pho Tested by: pho Approved by: bz (mentor)
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Place hostnames and similar information fully under the prison system.jamie2009-05-291-16/+155
| | | | | | | | | | | | | | | | | The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-559/+1658
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* Delay an error message until the variable it uses gets initialized.jamie2009-05-231-8/+6
| | | | | | | Found with: Coverity Prevent(tm) CID: 4316 Reported by: trasz Approved by: bz (mentor)
* Introduce a new virtualization container, provisionally named vprocg, to holdzec2009-05-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)
* Move the per-prison Linux MIB from a private one-off pointer to the newjamie2009-05-071-1/+0
| | | | | | | | | OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)
* Introduce the extensible jail framework, using the same "name=value"jamie2009-04-291-463/+1532
| | | | | | | | | | | | | | interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)
* Some non-functional changes: whitespace, KASSERT strings, declaration order.jamie2009-04-291-5/+5
| | | | Approved by: bz (mentor)
* Whitespace/spelling fixes in advance of upcoming functional changes.jamie2009-03-271-12/+12
| | | | Approved by: bz (mentor)
* Don't allow creating a socket with a protocol family that the currentjamie2009-02-051-0/+42
| | | | | | | | | | jail doesn't support. This involves a new function prison_check_af, like prison_check_ip[46] but that checks only the family. With this change, most of the errors generated by jailed sockets shouldn't ever occur, at least until jails are changeable. Approved by: bz (mentor)
* Standardize the various prison_foo_ip[46] functions and prison_if tojamie2009-02-051-70/+74
| | | | | | | | | | | | | | | return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)
* Mark most often used sysctl's as MPSAFE.ed2009-01-281-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | After running a `make buildkernel', I noticed most of the Giant locks in sysctl are only caused by a very small amount of sysctl's: - sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like sysctl.oidfmt. - kern.ident, kern.osrelease, kern.version, etc. These are just constant strings. - kern.arandom, used by the stack protector. It is already protected by arc4_mtx. I also saw the following sysctl's show up. Not as often as the ones above, but still quite often: - security.jail.jailed. Also mark security.jail.list as MPSAFE. They don't need locking or already use allprison_lock. - kern.devname, used by devname(3), ttyname(3), etc. This seems to reduce Giant locking inside sysctl by ~75% in my primitive test setup.
* For consistency with prison_{local,remote,check}_ipN renamebz2009-01-251-2/+2
| | | | | | | prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week
* Back out r186615; the sanitizing of the pointers in the error casebz2009-01-041-2/+0
| | | | | | is not needed and seems that it will not be needed either. Pointy hat: mine, mine, mine and not pho's
* Added missing second part of cleaning j->ip[46] as requested by bzpho2008-12-301-0/+2
| | | | | Approved by: kib (mentor) Pointy hat: pho
* Make sure that unused j->ip[46] are clearedpho2008-12-301-2/+4
| | | | | Reviewed by: bz Approved by: kib (mentor)
* Correctly check the number of prison states to not access anythingbz2008-12-111-2/+2
| | | | | | | | | | | | outside the prison_states array. When checking if there is a name configured for the prison, check the first character to not be '\0' instead of checking if the char array is present, which it always is. Note, that this is different for the *jailname in the syscall. Found with: Coverity Prevent(tm) CID: 4156, 4155 MFC after: 4 weeks (just that I get the mail)
* Unbreak the no-networks (no INET/6) build that I broke withbz2008-11-291-0/+2
| | | | | | the commit in r185435. Pointyhat: no, but I could need a ski cap for the winter
* MFp4:bz2008-11-291-60/+848
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
* With the permissions of phk@ change the license on kern_jail.cbz2008-11-281-6/+22
| | | | to a 2 clause BSD license.
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.pjd2008-11-171-234/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-6/+6
| | | | MFC after: 3 months
* Step 1.5 of importing the network stack virtualization infrastructurezec2008-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Commit step 1 of the vimage project, (network stack)bz2008-08-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
* MFp4 144659:bz2008-07-071-0/+4
| | | | | | | | Plug a memory leak with jail services. PR: 125257 Submitted by: Mateusz Guzik <mjguzik gmail.com> MFC after: 6 days
OpenPOWER on IntegriCloud