| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.
READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
|
|
|
|
| |
cred: add proc_set_cred helper
|
|
|
|
|
|
|
| |
Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.
Security: SA-16:20
Security: SA-16:21
|
|
|
|
| |
sys/compat/linux*: minor spelling fixes.
|
|
|
|
| |
Move Linux specific times tests up to guarantee the values are defined.
|
|
|
|
|
|
|
|
|
| |
1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer().
Assert that kern_setitimer() returns 0.
Remove bogus cast of secs.
Fix style(9) issues.
2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename remaining linux32 symbols from linux_* to linux32_*.
289769:
Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.
289822:
Fix build for the KTR-enabled kernels.
290143:
Fix build with DEBUG defined.
290144:
Update for LINUX32 rename. The assembler didn't complain about undefined
symbols but just used 0 after the rename.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
o Fix Linux compatibility layer incorrect futex handling. [SA-16:03.linux]
o Fix Linux compatibility layer setgroups(2) system call. [SA-16:04.linux]
o Fix TCP MD5 signature denial of service. [SA-16:05.tcp]
o Fix insecure default bsnmpd.conf permissions. [SA-16:06.bsnmpd]
Security: FreeBSD-SA-16:01.sctp, CVE-2016-1879
Security: FreeBSD-SA-16:03.linux, CVE-2016-1880
Security: FreeBSD-SA-16:04.linux, CVE-2016-1881
Security: FreeBSD-SA-16:05.tcp, CVE-2016-1882
Security: FreeBSD-SA-16:06.bsnmpd, CVE-2015-5677
|
|
|
|
|
| |
Properly check tv_nsec value. The tv_nsec field can also be one
of the special value UTIME_NOW or UTIME_OMIT.
|
|
|
|
| |
Add utimensat() system call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework signal code to allow using it by other modules, like linprocfs:
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.
PR: 197216
|
|
|
|
|
| |
Do not use struct l_timespec without conversion. While here move
args->timeout handling before acquiring the futex key at FUTEX_WAIT path.
|
|
|
|
| |
Implement ppoll() system call.
|
|
|
|
|
| |
Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.
|
|
|
|
|
|
|
| |
Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.
|
|
|
|
| |
Set WIFCONTINUED to the wait status if needed.
|
|
|
|
|
| |
Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.
|
|
|
|
|
|
|
|
|
| |
Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.
As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a new module linux_common.ko which is intended for the
following primary purposes:
1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.
2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).
3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.
Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.
Temporarily remove dtrace garbage from linux_mib.c and linux_util.c
|
|
|
|
| |
Fix compilation with -DDEBUG option.
|
|
|
|
| |
Disable i386 call for x86-64 Linux.
|
|
|
|
|
| |
Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.
|
|
|
|
| |
Implement pselect6() system call.
|
|
|
|
| |
Implement prlimit64() system call.
|
|
|
|
|
| |
Sched_rr_get_interval returns EINVAL in case when the invalid pid
specified. This silence the ltp tests.
|
|
|
|
| |
Implement waitid() system call.
|
|
|
|
|
|
| |
To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().
|
|
|
|
| |
Add a function for converting wait options.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch linuxulator to use the native 1:1 threads.
The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.
Implementation details:
1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.
Ugliness:
In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.
|
|
|
|
|
| |
Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).
Linuxulator temporarily uses first thread in proc.
Move linux_sched_rr_get_interval() to the MI part.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFC r275121 (by kib). Only merge the syntax changes from r275121,
PROC_*LOCK() macros still lock the same proc spinlock.
The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.
- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().
- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().
- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().
Discussed with: kib
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.
To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.
Reviewed by: alc
Approved by: re (kib)
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
was still possible to open for write from the lower filesystem. There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.
Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount. Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode. Always use the lower vnode v_writecount
for the checks.
Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode. Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.
Tested by: pho
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
| |
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.
Tested by: pho (previous version)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- DTrace scripts to check for errors, performance, ...
they serve mostly as examples of what you can do with the static probe;s
with moderate load the scripts may be overwhelmed, excessive lock-tracing
may influence program behavior (see the last design decission)
Design decissions:
- use "linuxulator" as the provider for the native bitsize; add the
bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
- Add probes only for locks which are acquired in one function and released
in another function. Locks which are aquired and released in the same
function should be easy to pair in the code, inter-function
locking is more easy to verify in DTrace.
- Probes for locks should be fired after locking and before releasing to
prevent races (to provide data/function stability in DTrace, see the
man-page of "dtrace -v ..." and the corresponding DTrace docs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.
Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.
Do not map the whole executable into KVA at all to copy it out into
usermode. Directly use vn_rdwr() for the case of not page aligned
binary.
There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.
Reviewed by: alc
MFC after: 2 weeks
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.
Reviewed by: rwatson
Approved by: re (bz)
|
|
|
|
|
| |
Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
domain sockets
This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.
PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks
|
| |
|
|
|
|
| |
MFC after: 1 Month.
|
|
|
|
|
|
|
|
|
| |
be used by linuxolator itself.
Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.
MFC after: 1 Month.
|
|
|
|
| |
MFC after: 1 month.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.
This is caused by:
1. After calling exec() in multithreaded linux program threads are not
destroyed and continue running. They get killed after program being
executed finishes.
2. linux_exit_group doesn't return correct exit code when called not
from group leader. Which happens regularly using sun jvm.
The submitters fix this in a similar way to how NetBSD handles this.
I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).
The patches committed here are from [2], with some little modifications
from me to the style.
PR: 141439 [1], 144194 [2]
Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by: rdivacky (in april 2010)
MFC after: 5 days
|
|
|
|
|
| |
"ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent
"> ngroups_max" to reduce confusion.
|
|
|
|
|
|
|
|
| |
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1. Given that the Windows group limit is 1024, this range
should be sufficient for most applications.
MFC after: 1 month
|