summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_exec.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r320619 MFS r320863:kib2017-07-101-9/+8
| | | | | | Resolve confusion between different error code spaces. Approved by: re (delphij)
* MFC r308474, r308691, r309203, r309365, r309703, r309898, r310720,markj2017-05-231-1/+1
| | | | | r308489, r308706: Add PQ_LAUNDRY and remove PG_CACHED pages.
* MFC r307653:mjg2016-12-311-4/+4
| | | | | | Mark a bunch of mpsafe sysctls as such. This gives me a sysctl Giant-free buildworld.
* MFC r304102alc2016-09-051-2/+3
| | | | | Eliminate unneeded vm_page_xbusy() and vm_page_xunbusy() operations when neither vm_pager_has_page() nor vm_pager_get_pages() is called.
* MFC r304050alc2016-08-281-2/+0
| | | | | | | | | | Eliminate two calls to vm_page_xunbusy() that are both unnecessary and incorrect from the error cases in exec_map_first_page(). They are unnecessary because we automatically unbusy the page in vm_page_free() when we remove it from the object. The calls are incorrect because they happen after the page is freed, so we might actually unbusy the page after it has been reallocated to a different object. (This error was introduced in r292373.)
* MFC 302900,302902,302921,303461,304009:jhb2016-08-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a mask of optional ptrace() events. 302900: Add a test for user signal delivery. This test verifies we get the correct ptrace event details when a signal is posted to a traced process from userland. 302902: Add a mask of optional ptrace() events. ptrace() now stores a mask of optional events in p_ptevents. Currently this mask is a single integer, but it can be expanded into an array of integers in the future. Two new ptrace requests can be used to manipulate the event mask: PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK sets the current event mask. The current set of events include: - PTRACE_EXEC: trace calls to execve(). - PTRACE_SCE: trace system call entries. - PTRACE_SCX: trace syscam call exits. - PTRACE_FORK: trace forks and auto-attach to new child processes. - PTRACE_LWP: trace LWP events. The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS. The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for compatibility but now simply toggle corresponding flags in the event mask. While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both modify the event mask and continue the traced process. 302921: Rename PTRACE_SYSCALL to LINUX_PTRACE_SYSCALL. 303461: Note that not all optional ptrace events use SIGTRAP. New child processes attached due to PTRACE_FORK use SIGSTOP instead of SIGTRAP. All other ptrace events use SIGTRAP. 304009: Remove description of P_FOLLOWFORK as this flag was removed.
* MFC r302614:kib2016-08-011-0/+2
| | | | | | | | | | Revive the check, disabled in r197963. MFC r302999: On first exec after vfork(), call signotify() to handle pending reenabled signals. Approved by: re (delphij)
* When filt_proc() removes event from the knlist due to the processkib2016-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exiting (NOTE_EXIT->knlist_remove_inevent()), two things happen: - knote kn_knlist pointer is reset - INFLUX knote is removed from the process knlist. And, there are two consequences: - KN_LIST_UNLOCK() on such knote is nop - there is nothing which would block exit1() from processing past the knlist_destroy() (and knlist_destroy() resets knlist lock pointers). Both consequences result either in leaked process lock, or dereferencing NULL function pointers for locking. Handle this by stopping embedding the process knlist into struct proc. Instead, the knlist is allocated together with struct proc, but marked as autodestroy on the zombie reap, by knlist_detach() function. The knlist is freed when last kevent is removed from the list, in particular, at the zombie reap time if the list is empty. As result, the knlist_remove_inevent() is no longer needed and removed. Other changes: In filt_procattach(), clear NOTE_EXEC and NOTE_FORK desired events from kn_sfflags for knote registered by kernel to only get NOTE_CHILD notifications. The flags leak resulted in excessive NOTE_EXEC/NOTE_FORK reports. Fix immediate note activation in filt_procattach(). Condition should be either the immediate CHILD_NOTE activation, or immediate NOTE_EXIT report for the exiting process. In knote_fork(), do not perform racy check for KN_INFLUX before kq lock is taken. Besides being racy, it did not accounted for notes just added by scan (KN_SCAN). Some minor and incomplete style fixes. Analyzed and tested by: Eric Badger <eric@badgerio.us> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D6859
* Old process credentials for setuid execve must not be dereferencedkib2016-06-081-3/+7
| | | | | | | | | | | | | when the process credentials were not changed. This can happen if an error occured trying to activate the setuid binary. And on error, if new credentials were not yet assigned, they must be freed to not create the leak. Use oldcred == NULL as the predicate to detect credential reassignment. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
* exec: get rid of one vnode lock/unlock pair in do_execvemjg2016-05-271-42/+30
| | | | | | | The lock was temporarily dropped for vrele calls, but they can be postponed to a point where the lock is not held in the first place. While here shuffle other code not needing the lock.
* exec: Provide execpath in imgp for the process_exec hook.bdrewery2016-05-261-8/+16
| | | | | | | | | | This was previously set after the hook and only if auxargs were present. Now always provide it if possible. MFC after: 2 weeks Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6546
* exec: Add credential change information into imgp for process_exec hook.bdrewery2016-05-261-86/+102
| | | | | | | | | | | | | | | | | | | | This allows an EVENTHANDLER(process_exec) hook to see if the new image will cause credentials to change whether due to setgid/setuid or because of POSIX saved-id semantics. This adds 3 new fields into image_params: struct ucred *newcred Non-null if the credentials will change. bool credential_setid True if the new image is setuid or setgid. This will pre-determine the new credentials before invoking the image activators, where the process_exec hook is called. The new credentials will be installed into the process in the same place as before, after image activators are done handling the image. MFC after: 2 weeks Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D6544
* sys/kern: spelling fixes in comments.pfg2016-04-291-1/+1
| | | | No functional change.
* Remove some NULL checks for M_WAITOK allocations.trasz2016-03-291-4/+0
| | | | | MFC after: 1 month Sponsored by: The FreeBSD Foundation
* Correct a comment.bdrewery2016-03-011-1/+1
|
* Fix style issues around existing SDT probes.markj2015-12-161-6/+6
| | | | | | | | | - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().glebius2015-12-161-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
* Fix core corruption caused by race in note_procstat_vmmapcem2015-10-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Reported by: pho (https://people.freebsd.org/~pho/stress/log/datamove4-2.txt) Relnotes: yes Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3824
* save some bytes by using more concise SDT_PROBE<n> instead of SDT_PROBEavg2015-09-281-3/+3
| | | | | | | | | SDT_PROBE requires 5 parameters whereas SDT_PROBE<n> requires n parameters where n is typically smaller than 5. Perhaps SDT_PROBE should be made a private implementation detail. MFC after: 20 days
* Follow-up to r287442: Move sysctl to compiled-once filecem2015-09-071-0/+5
| | | | | | | | | Avoid duplicate sysctl nodes. Found by: tijl Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3586
* Add sysent flag to switch to capabilities mode on startup.ed2015-08-031-0/+4
| | | | | | | | | CloudABI processes should run in capabilities mode automatically. There is no need to switch manually (e.g., by calling cap_enter()). Add a flag, SV_CAPSICUM, that can be used to call into cap_enter() during execve(). Reviewed by: kib
* The si_status field of the siginfo_t, provided by the waitid(2) andkib2015-07-181-1/+1
| | | | | | | | | | | | | | | | SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation
* Implement CloudABI's exec() call.ed2015-07-161-7/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In a runtime that is purely based on capability-based security, there is a strong emphasis on how programs start their execution. We need to make sure that we execute an new program with an exact set of file descriptors, ensuring that credentials are not leaked into the process accidentally. Providing the right file descriptors is just half the problem. There also needs to be a framework in place that gives meaning to these file descriptors. How does a CloudABI mail server know which of the file descriptors corresponds to the socket that receives incoming emails? Furthermore, how will this mail server acquire its configuration parameters, as it cannot open a configuration file from a global path on disk? CloudABI solves this problem by replacing traditional string command line arguments by tree-like data structure consisting of scalars, sequences and mappings (similar to YAML/JSON). In this structure, file descriptors are treated as a first-class citizen. When calling exec(), file descriptors are passed on to the new executable if and only if they are referenced from this tree structure. See the cloudabi-run(1) man page for more details and examples (sysutils/cloudabi-utils). Fortunately, the kernel does not need to care about this tree structure at all. The C library is responsible for serializing and deserializing, but also for extracting the list of referenced file descriptors. The system call only receives a copy of the serialized data and a layout of what the new file descriptor table should look like: int proc_exec(int execfd, const void *data, size_t datalen, const int *fds, size_t fdslen); This change introduces a set of fd*_remapped() functions: - fdcopy_remapped() pulls a copy of a file descriptor table, remapping all of the file descriptors according to the provided mapping table. - fdinstall_remapped() replaces the file descriptor table of the process by the copy created by fdcopy_remapped(). - fdescfree_remapped() frees the table in case we aborted before fdinstall_remapped(). We then add a function exec_copyin_data_fds() that builds on top these functions. It copies in the data and constructs a new remapped file descriptor. This is used by cloudabi_sys_proc_exec(). Test Plan: cloudabi-run(1) is capable of spawning processes successfully, providing it data and file descriptors. procstat -f seems to confirm all is good. Regular FreeBSD processes also work properly. Reviewers: kib, mjg Reviewed By: mjg Subscribers: imp Differential Revision: https://reviews.freebsd.org/D3079
* exec: textvp -> oldtextvp; binvp -> newtextvpmjg2015-07-141-15/+15
| | | | | | This makes it consistent with the rest of the naming in do_execve. No functional changes.
* exec plug a redundant vref + vrele of the image vnodemjg2015-07-141-8/+6
|
* Do not calculate the stack's bottom address twice.kib2015-06-301-1/+1
| | | | | | Submitted by: Olivц╘r Pintц╘r Review: https://reviews.freebsd.org/D2953 MFC after: 1 week
* Make KPI of vm_pager_get_pages() more strict: if a pager changes a pageglebius2015-06-121-7/+4
| | | | | | | | | | | in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.
* Implement lockless resource limits.mjg2015-06-101-1/+1
| | | | | | | | | | Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
* On exec, single-threading must be enforced before arguments space iskib2015-05-101-47/+58
| | | | | | | | | | | | allocated from exec_map. If many threads try to perform execve(2) in parallel, the exec map is exhausted and some threads sleep uninterruptible waiting for the map space. Then, the thread which won the race for the space allocation, cannot single-thread the process, causing deadlock. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Handle incorrect ELF images specifying size for PT_GNU_STACK not beingkib2015-04-231-1/+1
| | | | | | | multiple of page size. Sponsored by: The FreeBSD Foundation MFC after: 3 days
* Implement support for binary to requesting specific stack size for thekib2015-04-151-2/+15
| | | | | | | | | | | | initial thread. It is read by the ELF image activator as the virtual size of the PT_GNU_STACK program header entry, and can be specified by the linker option -z stack-size in newer binutils. The soft RLIMIT_STACK is auto-increased if possible, to satisfy the binary' request. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Introduce vm_object_color() and use it in mmap(2) to set the color ofalc2015-03-211-4/+1
| | | | | | | | | | | | | | | | named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks
* cred: add proc_set_cred helpermjg2015-03-161-2/+2
| | | | | | | The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.
* Add procctl(2) PROC_TRACE_CTL command to enable or disable debuggerkib2015-01-181-0/+2
| | | | | | | | | | | attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add facility to stop all userspace processes. The supposed use of thekib2014-12-131-3/+3
| | | | | | | | | | | | | | | | | | | | | feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* filedesc: fix missed comments about fdsetugidsafetymjg2014-10-311-4/+2
| | | | | While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.
* Replace some calls to fuword() by fueword() with proper error checking.kib2014-10-281-9/+17
| | | | | | Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks
* filedesc: cleanup setugidsafety a littlemjg2014-10-221-1/+1
| | | | | | | | | | | Rename it to fdsetugidsafety for consistency with other functions. There is no need to take filedesc lock if not closing any files. The loop has to verify each file and we are guaranteed fdtable has space for at least 20 fds. As such there is no need to check fd_lastfile. While here tidy up is_unsafe.
* Plug unnecessary binvp NULL initialization and test.mjg2014-10-201-3/+3
| | | | | Reported by: Coverity CID: 1018889
* Use bzero instead of explicitly zeroing stuff in do_execve.mjg2014-09-291-22/+1
| | | | | | | | While strictly speaking this is not correct since some fields are pointers, it makes no difference on all supported archs and we already rely on it doing the right thing in other places. No functional changes.
* If vm_page_grab() allocates a new page, the page is not inserted intokib2014-08-131-0/+1
| | | | | | | | | | | | | | | | | | | | page queue even when the allocation is not wired. It is responsibility of the vm_page_grab() caller to ensure that the page does not end on the vm_object queue but not on the pagedaemon queue, which would effectively create unpageable unwired page. In exec_map_first_page() and vm_imgact_hold_page(), activate the page immediately after unbusying it, to avoid leak. In the uiomove_object_page(), deactivate page before the object is unlocked. There is no leak, since the page is deactivated after uiomove_fromphys() finished. But allowing non-queued non-wired page in the unlocked object queue makes it impossible to assert that leak does not happen in other places. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Plug p_pptr null test in do_execve. It is always true.mjg2014-07-141-1/+1
|
* Don't call crdup nor uifind under vnode lock.mjg2014-07-071-2/+4
| | | | | | | | | A locked vnode can get into the way of satisyfing malloc with M_WATOK. This is a fixup to r268087. Suggested by: kib MFC after: 1 week
* Remove ia64.marcel2014-07-071-9/+0
| | | | | | | | | | | | | | | | | This includes: o All directories named *ia64* o All files named *ia64* o All ia64-specific code guarded by __ia64__ o All ia64-specific makefile logic o Mention of ia64 in comments and documentation This excludes: o Everything under contrib/ o Everything under crypto/ o sys/xen/interface o sys/sys/elf_common.h Discussed at: BSDcan
* Plug gcc warning after r268074 about unitialized newsigactsmjg2014-07-021-1/+3
| | | | Reported by: Gary Jennejohn <gljennjohn gmail.com>
* Don't call crcopysafe or uifind unnecessarily in execve.mjg2014-07-011-10/+10
| | | | MFC after: 1 week
* Perform a lockless check in sigacts_shared.mjg2014-07-011-5/+4
| | | | | | | | | | It is used only during execve (i.e. singlethreaded), so there is no fear of returning 'not shared' which soon becomes 'shared'. While here reorganize the code a little to avoid proc lock/unlock in shared case. MFC after: 1 week
* Call fdcloseexec right after fdunshare.mjg2014-06-281-2/+2
| | | | | | No functional changes. MFC after: 1 week
* Make fdunshare accept only td parameter.mjg2014-06-281-1/+1
| | | | | | | Proc had to match the thread anyway and 2 parameters were inconsistent with the rest. MFC after: 1 week
* Pull in r267961 and r267973 again. Fix for issues reported will follow.hselasky2014-06-281-2/+1
|
OpenPOWER on IntegriCloud