summaryrefslogtreecommitdiffstats
path: root/sys/compat/linux/linux_misc.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r302515:dchagin2016-07-171-5/+13
| | | | | | | | Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag. In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap(). Linux/i386 set this flag automatically if the binary requires executable stack. READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
* MFC r280130:bdrewery2016-06-271-1/+1
| | | | cred: add proc_set_cred helper
* Merge r301053:glebius2016-05-311-0/+1
| | | | | | | Fix kernel stack disclosures in the Linux and 4.3BSD compat layers. Security: SA-16:20 Security: SA-16:21
* MFC r298829pfg2016-05-141-1/+1
| | | | sys/compat/linux*: minor spelling fixes.
* MFCR r297519, r297525 (by pfg@):dchagin2016-04-101-3/+4
| | | | Move Linux specific times tests up to guarantee the values are defined.
* MFC r296502, r296543, r296546, r297060:dchagin2016-03-271-14/+17
| | | | | | | | | 1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer(). Assert that kern_setitimer() returns 0. Remove bogus cast of secs. Fix style(9) issues. 2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.
* MFC 289769,289822,290143,290144:jhb2016-01-201-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename remaining linux32 symbols from linux_* to linux32_*. 289769: Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. 289822: Fix build for the KTR-enabled kernels. 290143: Fix build with DEBUG defined. 290144: Update for LINUX32 rename. The assembler didn't complain about undefined symbols but just used 0 after the rename.
* o Fix SCTP ICMPv6 error message vulnerability. [SA-16:01.sctp]glebius2016-01-141-1/+3
| | | | | | | | | | | | | o Fix Linux compatibility layer incorrect futex handling. [SA-16:03.linux] o Fix Linux compatibility layer setgroups(2) system call. [SA-16:04.linux] o Fix TCP MD5 signature denial of service. [SA-16:05.tcp] o Fix insecure default bsnmpd.conf permissions. [SA-16:06.bsnmpd] Security: FreeBSD-SA-16:01.sctp, CVE-2016-1879 Security: FreeBSD-SA-16:03.linux, CVE-2016-1880 Security: FreeBSD-SA-16:04.linux, CVE-2016-1881 Security: FreeBSD-SA-16:05.tcp, CVE-2016-1882 Security: FreeBSD-SA-16:06.bsnmpd, CVE-2015-5677
* MFC r283491:dchagin2016-01-091-2/+16
| | | | | Properly check tv_nsec value. The tv_nsec field can also be one of the special value UTIME_NOW or UTIME_OMIT.
* MFC r283480:dchagin2016-01-091-0/+81
| | | | Add utimensat() system call.
* MFC r283474:dchagin2016-01-091-3/+3
| | | | | | | | | | | | | | | | | | | Rework signal code to allow using it by other modules, like linprocfs: 1. Linux sigset always 64 bit on all platforms. In order to move Linux sigset code to the linux_common module define it as 64 bit int. Move Linux sigset manipulation routines to the MI path. 2. Move Linux signal number definitions to the MI path. In general, they are the same on all platforms except for a few signals. 3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion tables to avoid conversion errors. 4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside of allowed on Linux signal numbers. PR: 197216
* MFC r283463:dchagin2016-01-091-10/+11
| | | | | Do not use struct l_timespec without conversion. While here move args->timeout handling before acquiring the futex key at FUTEX_WAIT path.
* MFC r283451:dchagin2016-01-091-0/+53
| | | | Implement ppoll() system call.
* MFC r283443:dchagin2016-01-091-2/+2
| | | | | Put the correct value for the abi_nfdbits parameter of kern_select() for all supported Linuxulators.
* MFC r283435:dchagin2016-01-091-18/+13
| | | | | | | Convert Linux wait options to the FreeBSD. Check wait options as a Linux do. Linux always set WEXITED option not a WUNTRACED|WNOHANG which is a strange bug.
* MFC r283434:dchagin2016-01-091-0/+2
| | | | Set WIFCONTINUED to the wait status if needed.
* MFC r283427:dchagin2016-01-091-4/+4
| | | | | Where possible we will use M_LINUX malloc(9) type. Move M_FUTEX defines to the linux_common.ko.
* MFC r283422:dchagin2016-01-091-0/+15
| | | | | | | | | Refund the proc emuldata struct for future use. For now move flags from thread emuldata to proc emuldata as it was originally intended. As we can have both 64 & 32 bit Linuxulator running any eventhandler can be called twice for us. To prevent this move eventhandlers code from linux_emul.c to the linux_common.ko module.
* MFC r283421:dchagin2016-01-091-0/+57
| | | | | | | | | | | | | | | | | | | Introduce a new module linux_common.ko which is intended for the following primary purposes: 1. Remove the dependency of linsysfs and linprocfs modules from linux.ko, which will be architecture specific on amd64. 2. Incorporate into linux_common.ko general code for platforms on which we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit). 3. Move malloc(9) declaration to linux_common.ko, to enable getting memory usage statistics properly. Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko. Temporarily remove dtrace garbage from linux_mib.c and linux_util.c
* MFC r283419:dchagin2016-01-091-3/+5
| | | | Fix compilation with -DDEBUG option.
* MFC r283415:dchagin2016-01-091-2/+6
| | | | Disable i386 call for x86-64 Linux.
* MFC r283410:dchagin2016-01-091-1/+1
| | | | | Put linux_platform into the vdso to avoid copying it onto the stack at every exec.
* MFC r283403:dchagin2016-01-091-0/+84
| | | | Implement pselect6() system call.
* MFC r283401:dchagin2016-01-091-0/+73
| | | | Implement prlimit64() system call.
* MFC r283398:dchagin2016-01-091-0/+7
| | | | | Sched_rr_get_interval returns EINVAL in case when the invalid pid specified. This silence the ltp tests.
* MFC r283394:dchagin2016-01-091-0/+62
| | | | Implement waitid() system call.
* MFC r283391:dchagin2016-01-091-0/+29
| | | | | | To reduce code duplication introduce linux_copyout_rusage() method. Use it in linux_wait4() system call and move linux_wait4() to the MI path. While here add a prototype for the static bsd_to_linux_rusage().
* MFC r283390:dchagin2016-01-091-0/+19
| | | | Add a function for converting wait options.
* MFC r283383:dchagin2016-01-091-135/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch linuxulator to use the native 1:1 threads. The reasons: 1. Get rid of the stubs/quirks with process dethreading, process reparent when the process group leader exits and close to this problems on wait(), waitpid(), etc. 2. Reuse our kernel code instead of writing excessive thread managment routines in Linuxulator. Implementation details: 1. The thread is created via kern_thr_new() in the clone() call with the CLONE_THREAD parameter. Thus, everything else is a process. 2. The test that the process has a threads is done via P_HADTHREADS bit p_flag of struct proc. 3. Per thread emulator state data structure is now located in the struct thread and freed in the thread_dtor() hook. Mandatory holdig of the p_mtx required when referencing emuldata from the other threads. 4. PID mangling has changed. Now Linux pid is the native tid and Linux tgid is the native pid, with the exception of the first thread in the process where tid and pid are one and the same. Ugliness: In case when the Linux thread is the initial thread in the thread group thread id is equal to the process id. Glibc depends on this magic (assert in pthread_getattr_np.c). So for system calls that take thread id as a parameter we should use the special method to reference struct thread.
* MFC r283379:dchagin2016-01-091-0/+72
| | | | | Implement a Linux version of sched_getparam() && sched_setparam(). Temporarily use the first thread in proc.
* MFC r283374:dchagin2016-01-091-0/+30
| | | | | | | | | | | In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part.
* To facillitate an upcoming Linuxulator merging partiallydchagin2016-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r275121 (by kib). Only merge the syntax changes from r275121, PROC_*LOCK() macros still lock the same proc spinlock. The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). Discussed with: kib
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-3/+4
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-1/+1
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* The r241025 fixed the case when a binary, executed from nullfs mount,kib2012-11-021-3/+5
| | | | | | | | | | | | | | | | | | | | | | | was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-8/+3
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix the mis-handling of the VV_TEXT on the nullfs vnodes.kib2012-09-281-1/+1
| | | | | | | | | | | | | | | | If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
* - >500 static DTrace probes for the linuxulatornetchild2012-05-051-0/+14
| | | | | | | | | | | | | | | | | | - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).
* Fix misuse of the kernel map in miscellaneous image activators.kib2012-02-171-22/+12
| | | | | | | | | | | | | | | | | | | Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks
* Convert files to UTF-8uqs2012-01-151-1/+1
|
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-13/+13
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Add accounting for most of the memory-related resources.trasz2011-04-051-1/+4
| | | | | Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* linux compat: improve and fix sendmsg/recvmsg compatibilityavg2011-03-261-0/+109
| | | | | | | | | | | | | | | | | - implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS and prctl PR_SET_KEEPCAPS. - add SCM_CREDS support to sendmsg and recvmsg - modify sendmsg to ignore control messages if not using UNIX domain sockets This should allow linux pulse audio daemon and client work on FreeBSD and interoperate with native counter-parts modulo the differences in pulseaudio versions. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks
* Put the macro declaration in the relevant include file for future use.dchagin2011-02-151-3/+0
|
* Style(9) fixes.dchagin2011-01-281-28/+28
| | | | MFC after: 1 Month.
* Implement a variation of the linux_common_wait() which shoulddchagin2011-01-281-63/+23
| | | | | | | | | be used by linuxolator itself. Move linux_wait4() to MD path as it requires native struct rusage translation to struct l_rusage on linux32/amd64. MFC after: 1 Month.
* Style(9) fix.dchagin2011-01-281-1/+1
| | | | MFC after: 1 month.
* By using the 32-bit Linux version of Sun's Java Development Kit 1.6netchild2010-11-221-21/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | on FreeBSD (amd64), invocations of "javac" (or "java") eventually end with the output of "Killed" and exit code 137. This is caused by: 1. After calling exec() in multithreaded linux program threads are not destroyed and continue running. They get killed after program being executed finishes. 2. linux_exit_group doesn't return correct exit code when called not from group leader. Which happens regularly using sun jvm. The submitters fix this in a similar way to how NetBSD handles this. I took the PRs away from dchagin, who seems to be out of touch of this since a while (no response from him). The patches committed here are from [2], with some little modifications from me to the style. PR: 141439 [1], 144194 [2] Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk Reviewed by: rdivacky (in april 2010) MFC after: 5 days
* Since all other comparisons involving ngroups_max usebrooks2010-01-151-1/+1
| | | | | "ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent "> ngroups_max" to reduce confusion.
* Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamicbrooks2010-01-121-1/+1
| | | | | | | | kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month
OpenPOWER on IntegriCloud