summaryrefslogtreecommitdiffstats
path: root/sys/compat/linux
Commit message (Collapse)AuthorAgeFilesLines
* Remove check for NULL prior to free(9) and m_freem(9).eadler2013-03-042-6/+3
| | | | Approved by: cperciva (mentor)
* Merge Capsicum overhaul:pjd2013-03-021-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
* Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.hjhb2013-01-2912-2/+174
| | | | | | | by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry | gmail MFC after: 2 weeks
* Arithmetic on pointers takes into account the size of the type. Properly ↵dchagin2013-01-251-2/+2
| | | | | | cast the pointer to avoid incorrect pointer scaling. MFC after: 1 Week
* Don't assume that all Linux TCP-level socket options are identical tojhb2013-01-231-4/+24
| | | | | | | | FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks
* Mechanically substitute flags from historic mbuf allocator withglebius2012-12-051-1/+1
| | | | | | | | | malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
* MFS security patches which seem to have accidentally not reached HEAD:cperciva2012-11-231-2/+3
| | | | | | | | | | | Fix insufficient message length validation for EAP-TLS messages. Fix Linux compatibility layer input validation error. Security: FreeBSD-SA-12:07.hostapd Security: FreeBSD-SA-12:08.linux Security: CVE-2012-4445, CVE-2012-4576 With hat: so@
* The r241025 fixed the case when a binary, executed from nullfs mount,kib2012-11-021-3/+5
| | | | | | | | | | | | | | | | | | | | | | | was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-222-12/+4
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix the mis-handling of the VV_TEXT on the nullfs vnodes.kib2012-09-281-1/+1
| | | | | | | | | | | | | | | | If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
* Remove redundant checkkevlo2012-09-121-5/+0
|
* Extend the KPI to lock and unlock f_offset member of struct file. Itkib2012-07-021-2/+3
| | | | | | | | | | | | | | | | | | now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks
* - >500 static DTrace probes for the linuxulatornetchild2012-05-0516-207/+2053
| | | | | | | | | | | | | | | | | | - DTrace scripts to check for errors, performance, ... they serve mostly as examples of what you can do with the static probe;s with moderate load the scripts may be overwhelmed, excessive lock-tracing may influence program behavior (see the last design decission) Design decissions: - use "linuxulator" as the provider for the native bitsize; add the bitsize for the non-native emulation (e.g. "linuxuator32" on amd64) - Add probes only for locks which are acquired in one function and released in another function. Locks which are aquired and released in the same function should be easy to pair in the code, inter-function locking is more easy to verify in DTrace. - Probes for locks should be fired after locking and before releasing to prevent races (to provide data/function stability in DTrace, see the man-page of "dtrace -v ..." and the corresponding DTrace docs).
* - Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27jkim2012-04-161-0/+49
| | | | | | | | | | but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation
* Fix misuse of the kernel map in miscellaneous image activators.kib2012-02-171-22/+12
| | | | | | | | | | | | | | | | | | | Vnode-backed mappings cannot be put into the kernel map, since it is a system map. Use exec_map for transient mappings, and remove the mappings with kmem_free_wakeup() to notify the waiters on available map space. Do not map the whole executable into KVA at all to copy it out into usermode. Directly use vn_rdwr() for the case of not page aligned binary. There is one place left where the potentially unbounded amount of data is mapped into exec_map, namely, in the COFF image activator enumeration of the needed shared libraries. Reviewed by: alc MFC after: 2 weeks
* Remove direct access to si_name.ed2012-02-103-5/+5
| | | | | | | | Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks
* Convert files to UTF-8uqs2012-01-157-7/+7
|
* In sys/compat/linux/linux_ioctl.c, work around a warning when a pointerdim2012-01-031-1/+1
| | | | | | | | is compared to an integer, by casting the pointer to l_uintptr_t. No functional difference on both i386 and amd64. Reviewed by: ed, jhb MFC after: 1 week
* Implement linux_fadvise64() and linux_fadvise64_64() usingjhb2011-12-291-0/+45
| | | | | | | kern_posix_fadvise(). Reviewed by: silence on emulation@ MFC after: 2 weeks
* Make the Linux *at() calls a bit more complete.ed2011-11-192-14/+16
| | | | | | | Properly support: - AT_EACCESS for faccessat(), - AT_SYMLINK_FOLLOW for linkat().
* Improve *access*() parameter name consistency.ed2011-11-191-6/+6
| | | | | | | | | The current code mixes the use of `flags' and `mode'. This is a bit confusing, since the faccessat() function as a `flag' parameter to store the AT_ flag. Make this less confusing by using the same name as used in the POSIX specification -- `amode'.
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-1/+1
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.ed2011-11-071-2/+2
| | | | This means that their use is restricted to a single C file.
* Add curly braces missed in r226247.brueffer2011-10-111-1/+2
| | | | | | Pointy hat to: brueffer Submitted by: many MFC after: 1 week
* Properly free linux_gidset in case of an error.brueffer2011-10-111-0/+1
| | | | | | CID: 4136 Found with: Coverity Prevent(tm) MFC after: 1 week
* Use the caculated length instead of maximum length.jkim2011-10-061-2/+2
|
* Remove a now-defunct variable.jkim2011-10-061-16/+15
|
* Use uint32_t instead of u_int32_t. Fix style(9) nits.jkim2011-10-061-10/+9
|
* Make sure to ignore the leading NULL byte from Linux abstract namespace.jkim2011-10-061-2/+10
|
* Restore the original socket address length if it was not really AF_INET6.jkim2011-10-061-16/+19
|
* Retern more appropriate errno when Linux path name is too long.jkim2011-10-061-1/+1
|
* Inline do_sa_get() function and remove an unused return value.jkim2011-10-061-23/+9
|
* Unroll inlined strnlen(9) and make it easier to read. No functional change.jkim2011-10-061-10/+6
|
* Fix a bug in UNIX socket handling in the linux emulator which wascperciva2011-10-041-0/+15
| | | | | | | | | exposed by the security fix in FreeBSD-SA-11:05.unix. Approved by: so (cperciva) Approved by: re (kib) Security: Related to FreeBSD-SA-11:05.unix, but not actually a security fix.
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-168-177/+177
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Add experimental support for process descriptorsjonathan2011-08-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-114-26/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* Remove the 'either' from the comment as it'll be less obvious that webz2011-07-171-4/+4
| | | | | | removed semmap in a bit of time from now. Re-wrap. Suggested by: jhb
* Remove semaphore map entry count "semmap" field and its tuningbz2011-07-141-1/+9
| | | | | | | | | | | option that is highly recommended to be adjusted in too much documentation while doing nothing in FreeBSD since r2729 (rev 1.1). ipcs(1) needs to be recompiled as it is accessing _KERNEL private variables. Reviewed by: jhb (before comment change on linux code) Sponsored by: Sandvine Incorporated
* Commit the missing linux_videdev2_compat.h (lost somewhere betweennetchild2011-05-041-0/+137
| | | | | | commit tree patch generation -> successful compile tree build test -> commmit). Pointy hat to: netchild
* Add FEATURE macros for v4l and v4l2 to the linuxulator.netchild2011-05-041-0/+4
| | | | Suggested by: ae
* This is v4l2 support for the linuxulator. This allows to access FreeBSDnetchild2011-05-042-0/+403
| | | | | | | | | native devices which support the v4l2 API from processes running within the linuxulator, e.g. skype or flash can access the multimedia/pwcbsd or multimedia/webcamd supplied drivers. Submitted by: nox MFC after: 1 month
* Fix typo in comment, improve comment.netchild2011-05-041-2/+2
|
* Add explanation about the use-permission and FreeBSDify it.netchild2011-05-041-0/+15
|
* Copy the v4l2 header unchanged from the vendor branch.netchild2011-05-041-0/+1164
|
* Add accounting for most of the memory-related resources.trasz2011-04-051-1/+4
| | | | | Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
* Revert r220032:linux compat: add SO_PASSCRED option with basic handlingavg2011-03-311-14/+0
| | | | | | | | | | I have not properly thought through the commit. After r220031 (linux compat: improve and fix sendmsg/recvmsg compatibility) the basic handling for SO_PASSCRED is not sufficient as it breaks recvmsg functionality for SCM_CREDS messages because now we would need to handle sockcred data in addition to cmsgcred. And that is not implemented yet. Pointyhat to: avg
* linux compat: add SO_PASSCRED option with basic handlingavg2011-03-261-0/+14
| | | | | | | | This seems to have been a part of a bigger patch by dchagin that either haven't been committed or committed partially. Submitted by: dchagin, nox MFC after: 2 weeks
* linux compat: improve and fix sendmsg/recvmsg compatibilityavg2011-03-264-53/+252
| | | | | | | | | | | | | | | | | - implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS and prctl PR_SET_KEEPCAPS. - add SCM_CREDS support to sendmsg and recvmsg - modify sendmsg to ignore control messages if not using UNIX domain sockets This should allow linux pulse audio daemon and client work on FreeBSD and interoperate with native counter-parts modulo the differences in pulseaudio versions. PR: kern/149168 Submitted by: John Wehle <john@feith.com> Reviewed by: netchild MFC after: 2 weeks
* Staticize functions which are not used somewhere else, move thenetchild2011-03-152-6/+7
| | | | corresponding prototypes from the header to the code file.
OpenPOWER on IntegriCloud