summaryrefslogtreecommitdiffstats
path: root/sys/i386/linux/linux_machdep.c
Commit message (Collapse)AuthorAgeFilesLines
* Change the cap_rights_t type from uint64_t to a structure that we can extendpjd2013-09-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
* - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27jkim2012-04-161-19/+0
| | | | | | | | | | but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-14/+14
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Move linux_clone(), linux_fork(), linux_vfork() to a MI path.dchagin2011-02-121-242/+0
|
* In preparation for moving linux_clone() to a MI pathdchagin2011-02-121-1/+10
| | | | introduce linux_set_upcall_kse().
* In preparation for moving linux_clone () to a MI pathdchagin2011-02-121-54/+58
| | | | | | move the TLS code in a separate function. Use function parameter instead of direct using register.
* The kern_wait() code already removes the SIGCHLD signal for the waiteddchagin2011-01-301-7/+0
| | | | | | | | | process. Removing other SIGCHLD signals is not needed and may cause problems. Pointed out by: jilles MFC after: 1 Month.
* Implement a variation of the linux_common_wait() which shoulddchagin2011-01-281-0/+38
| | | | | | | | | be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.
* Add macro to test the sv_flags of any process. Change some places to testdchagin2011-01-261-2/+2
| | | | | | | the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month
* Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requestskan2010-06-101-2/+6
| | | | | | | | | | in Linux emulation layer. Linux seems to only require that pos is page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter checking is too strict to allow some Linux binaries to run. tsMuxeR is one example of such a binary. Discussed with: jhb MFC after: 1 week
* Fix some problems with effective mmap() offsets > 32 bits. This wasjhb2009-10-281-34/+31
| | | | | | | | | | | | partially fixed on amd64 earlier. Rather than forcing linux_mmap_common() to use a 32-bit offset, have it accept a 64-bit file offset. This offset is then passed to the real mmap() call. Rather than inventing a structure to hold the normal linux_mmap args that has a 64-bit offset, just pass each of the arguments individually to linux_mmap_common() since that more closes matches the existing style of various kern_foo() functions. Submitted by: Christian Zander @ Nvidia MFC after: 1 week
* Return ENOSYS instead of EINVAL for invalid function codes to match thejhb2009-06-261-4/+1
| | | | | | | behavior of Linux. Reported by: Alexander Best alexbestms of math.uni-muenster.de Approved by: re (kib)
* Adapt linux emulation to use cv for vfork wait.kib2009-02-181-2/+2
| | | | | Submitted by: Takahiro Kurosawa <takahiro.kurosawa gmail com> PR: kern/131506
* Several cleanups related to pipe(2).ed2008-11-111-15/+5
| | | | | | | | | | | | | | | | | | - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
* Fix Linux mmap with MAP_GROWSDOWN flag.jkim2008-02-111-14/+15
| | | | | | | Reported by: Andriy Gapon (avg at icyb dot net dot ua) Tested by: Andriy Gapon (avg at icyb dot net dot ua) Pointyhat: me MFC after: 3 days
* Implement read_default_ldt in linux_modify_ldt(). It copies out zeroedkib2007-11-261-0/+9
| | | | | | | | descriptor, like real Linux does. Tested by: Yuriy Tsibizov <yuriy.tsibizov at gmail com> Submitted by: rdivacky MFC after: 1 week
* i386_set_ioperm, i386_get_ldt and i386_set_ldt are now MPSAFEattilio2007-07-201-6/+0
| | | | | | | | (Giant/sched_lock free) so remove unuseful Giant cruft. Approved by: jeff Approved by: re Sponsorized by: NGX Italy (http://www.ngx.it)
* Don't add the 'pad' argument to the mmap/truncate/etc syscalls.peter2007-07-041-2/+0
| | | | | Submitted by: kensmith Approved by: re (kensmith)
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-6/+6
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do not dereference linux_to_bsd_signal[-1] if userland haskan2007-05-111-4/+5
| | | | | | | | passed zero as exit signal. GCC 4.2 changes the kernel data segment layout not to have 0 in that memory location. This code ran by luck before and now the luck has run out.
* MFP4: 115220, 115222jkim2007-03-021-26/+26
| | | | | - Fix style(9) and reduce diff between amd64 and i386. - Prefix Linuxulator macros with LINUX_ to prevent future collision.
* MFP4: 115094jkim2007-02-271-3/+4
| | | | | | | | Linux does not check file descriptor when MAP_ANONYMOUS is set. This should fix recent LTP test regressions. Reported by: Scot Hetzel (swhetzel at gmail dot com) netchild
* Partial MFp4 of 114977:netchild2007-02-241-17/+21
| | | | | | Whitespace commit: Fix grammar, spelling and punctuation. Submitted by: "Scot Hetzel" <swhetzel@gmail.com>
* MFp4 (114193 (i386 part), 114194, 114195, 114200):netchild2007-02-231-39/+42
| | | | | | | | | | | | | - Dont "return" in linux_clone() after we forked the new process in a case of problems. - Move the copyout of p2->p_pid outside the emul_lock coverage in linux_clone(). - Cache the em->pdeath_signal in a local variable and move the copyout out of the emul_lock coverage. - Move the free() out of the emul_shared_lock coverage in a preparation to switch emul_lock to non-sleepable lock (mutex). Submitted by: rdivacky
* MFP4: 113025, 113146, 113177, 113203, 113500, 113546, 113570jkim2007-02-151-54/+58
| | | | | | | | - PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC. Linux/ia64's i386 emulation layer does this and it complies with Linux header files. This fixes mmap05 LTP test case on amd64. - Do not adjust stack size when failure has occurred. - Synchronize i386 mmap/mprotect with amd64.
* Fix LOR that occurs because proctree_lock was acquired while holdingkib2007-02-011-8/+8
| | | | | | emuldata lock by moving the code upwards outside the emul_lock coverage. Submitted by: rdivacky
* - Remove setrunqueue and replace it with direct calls to sched_add().jeff2007-01-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | setrunqueue() was mostly empty. The few asserts and thread state setting were moved to the individual schedulers. sched_add() was chosen to displace it for naming consistency reasons. - Remove adjustrunqueue, it was 4 lines of code that was ifdef'd to be different on all three schedulers where it was only called in one place each. - Remove the long ifdef'd out remrunqueue code. - Remove the now redundant ts_state. Inspect the thread state directly. - Don't set TSF_* flags from kern_switch.c, we were only doing this to support a feature in one scheduler. - Change sched_choose() to return a thread rather than a td_sched. Also, rely on the schedulers to return the idlethread. This simplifies the logic in choosethread(). Aside from the run queue links kern_switch.c mostly does not care about the contents of td_sched. Discussed with: julian - Move the idle thread loop into the per scheduler area. ULE wants to do something different from the other schedulers. Suggested by: jhb Tested on: x86/amd64 sched_{4BSD, ULE, CORE}.
* MFp4 (113077, 113083, 113103, 113124, 113097):netchild2007-01-201-5/+41
| | | | | | | | | | | | | | | | | | | | | | | | | Dont expose em->shared to the outside world before its properly initialized. Might not affect anything but its at least a better coding style. Dont expose em via p->p_emuldata until its properly initialized. This also enables us to get rid of some locking and simplify the code because we are workin on a local copy. In linux_fork and linux_vfork create the process in stopped state to be sure that the new process runs with fully initialized emuldata structure [1]. Also fix the vfork (both in linux_clone and linux_vfork) race that could result in never woken up process [2]. Reported by: Scot Hetzel [1] Suggested by: jhb [2] Reviewed by: jhb (at least some important parts) Submitted by: rdivacky Tested by: Scot Hetzel (on amd64) Change 2 comments (in the new code) to comply to style(9). Suggested by: jhb
* MFp4 (112893):netchild2007-01-141-0/+1
| | | | | | | Make linux_vfork() actually work. This enables make to work again with 2.6. It also fixes the LTP vfork tests. Submitted by: rdivacky
* MFp4 (112498):netchild2007-01-071-1/+1
| | | | | | Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion. Submitted by: rdivacky
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-1/+2
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
* Fix a recent regression regarding valid signals.netchild2006-10-201-1/+1
| | | | Submitted by: rdivacky
* MFP4 (106538 + 106541):netchild2006-10-151-0/+10
| | | | | | Implement CLONE_VFORK. This fixes the clone05 LTP test. Submitted by: rdivacky
* Revert my previous commit, I mismerged this to the wrong place.netchild2006-10-151-1/+0
| | | | Pointy hat to: netchild
* MFP4 (106541): Fix the clone05 test in the LTP.netchild2006-10-151-0/+1
| | | | Submitted by: rdivacky
* MFP4 (107144[1]): Implement CLONE_FS on i386[1] and amd64.netchild2006-10-151-1/+7
| | | | Submitted by: rdivacky [1]
* MFP4 (107868 - 107870):netchild2006-10-151-1/+1
| | | | | | Use a macro to test for a valid signal instead of doing it my hand everywhere. Submitted by: rdivacky
* style(9)netchild2006-09-201-4/+4
| | | | | | | While I'm here add a MFC reminder, I forgot it in the previous commit. Noticed by: ssouhlal MFC after: 1 week
* Bring the i386 linux mmap code more into line with how linux (2.4.x)netchild2006-09-201-2/+42
| | | | | | | | | behaves. This fixes a lot of test which failed before. For amd64 there are still some problems, but without any testers which apply patches and run some predefines tests we can't do more ATM. Submitted by: Marcin Cieslak <saper@SYSTEM.PL> (minor fixups by myself) Tested with: LTP
* Fix video playing and network connections in realplayer (and most likelynetchild2006-08-271-14/+11
| | | | | | | | | | | | | | | | | | | | other stuff) in the osrelease=2.6.16 case: - implement CLONE_PARENT semantic - fix TLS loading in clone CLONE_SETTLS - lock proc in the currently disabled part of CLONE_THREAD I suggest to not unload the linux module after testing this, there are some "<defunct>" processes hanging around after exiting (they aren't with osrelease=2.4.2) and they may panic your kernel when unloading the linux module. They are in state Z and some of them consume CPU according to ps. But I don't trust the CPU part, the idle threads gets too much CPU that this may be possible (accumulating idle, X and 2 defunct processes results in 104.7%, this looks to much to be a rounding error). Noticed by: Intron <mag@intron.ac> Submitted by: rdivacky (in collaboration with Intron) Tested by: Intron, netchild Reviewed by: jhb (previous version)
* Emulate what vfork does instead of using it in linux_vfork. This waynetchild2006-08-251-1/+13
| | | | | | | | | we can do the stuff we need to do with linux processes at fork and don't panic the kernel at exit of the child. Submitted by: rdivacky Tested with: tst-vfork* (glibc regression tests) Tested by: netchild
* Move some stuff into headers where they belong.netchild2006-08-171-3/+0
| | | | | | Sponsored by: Google SoC 2006 Submitted by: rdivacky Noticed by: jhb, ssouhlal
* Style fixes to comments.netchild2006-08-161-8/+16
| | | | | | Sponsored by: Google SoC 2006 Submitted by: rdivacky Noticed by: jhb, ssouhlal
* Add the linux 2.6.x stuff (not used by default!):netchild2006-08-151-25/+356
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - TLS - complete - pid/tid mangling - complete - thread area - complete - futexes - complete with issues - clone() extension - complete with some possible minor issues - mq*/timer*/clock* stuff - complete but untested and the mq* stuff is disabled when not build as part of the kernel with native FreeBSD mq* support (module support for this will come later) Tested with: - linux-firefox - works, tested - linux-opera - works, tested - linux-realplay - doesnt work, issue with futexes - linux-skype - doesnt work, issue with futexes - linux-rt2-demo - works, tested - linux-acroread - doesnt work, unknown reason (coredump) and sometimes issue with futexes - various unix utilities in linux-base-gentoo3 and linux-base-fc4: everything tried worked On amd64 not everything is supported like on i386, the catchup is planned for later when the remaining bugs in the new functions are fixed. To test this new stuff, you have to run sysctl compat.linux.osrelease=2.6.16 to switch back use sysctl compat.linux.osrelease=2.4.2 Don't switch while running a linux program, strange things may or may not happen. Sponsored by: Google SoC 2006 Submitted by: rdivacky Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild
* - Always call exec_free_args() in kern_execve() instead of doing it in alljhb2006-02-061-1/+0
| | | | | | the callers if the exec either succeeds or fails early. - Move the code to call exit1() if the exec fails after the vmspace is gone to the bottom of kern_execve() to cut down on some code duplication.
* Propagate error code of kern_execve() to the caller properly.sobomax2005-08-011-1/+1
| | | | | | PR: 81670 Submitted by: Andrew Bliznak <andriko.b@gmail.com> Pointy hat to: sobomax
* In linux emulation layer try to detect attempt to use linux_clone() tosobomax2005-03-031-0/+19
| | | | | | | | | | | | | | | | | | | | | | create kernel threads and call rfork(2) with RFTHREAD flag set in this case, which puts parent and child into the same threading group. As a result all threads that belong to the same program end up in the same threading group. This is similar to what linuxthreads port does, though in this case we don't have a luxury of having access to the source code and there is no definite way to differentiate linux_clone() called for threading purposes from other uses, so that we have to resort to heuristics. Allow SIGTHR to be delivered between all processes in the same threading group previously it has been blocked for s[ug]id processes. This also should improve locking of the same file descriptor from different threads in programs running under linux compat layer. PR: kern/72922 Reported by: Andriy Gapon <avg@icyb.net.ua> Idea suggested by: rwatson
* Use the LCONVPATHEXIST() macro rather than it's exact expansion to bejhb2005-02-071-4/+1
| | | | consistent.
* o Split out kernel part of execve(2) syscall into two parts: one thatsobomax2005-01-291-9/+17
| | | | | | | | | | | copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
OpenPOWER on IntegriCloud