summaryrefslogtreecommitdiffstats
path: root/sys/kern/subr_trap.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r302614:kib2016-08-011-10/+45
| | | | | | | | Revive the check, disabled in r197963. MFC r302999: On first exec after vfork(), call signotify() to handle pending reenabled signals.
* MFC r282213:trasz2015-06-211-5/+8
| | | | | | | | | | | | | | | | | | Add kern.racct.enable tunable and RACCT_DISABLED config option. The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). MFC r282901: Build GENERIC with RACCT/RCTL support by default. Note that it still needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf. Note those two are MFC-ed together, because the latter one changes the name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have committed the renaming separately... Relnotes: yes Sponsored by: The FreeBSD Foundation
* MFC r283600:kib2015-06-101-0/+7
| | | | | | | | Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread.
* Merge r263233 from HEAD to stable/10:rwatson2015-03-191-1/+1
| | | | | | | | | Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.
* MFC r277055:kib2015-01-191-2/+0
| | | | Revert r263475: TDP_DEVMEMIO no longer needed.
* MFC r263475:kib2014-03-281-0/+2
| | | | | | | | | | | | | | | | Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
* Partially revert r195702. Deferring stops is now implemented via a set ofjhb2013-03-181-1/+1
| | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags.
* When throttling a process to enforce RACCT limits, do not use neithertrasz2013-03-141-9/+2
| | | | | | | | PBDRY (which simply doesn't make any sense) nor PCATCH (which could be used by a malicious process to work around the PCPU limit). Submitted by: Rudo Tomori Reviewed by: kib
* Replace the TDP_NOSLEEPING flag with a counter so that thejhb2013-03-011-1/+1
| | | | | | THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio
* Further refine the handling of stop signals in the NFS client. Thejhb2013-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_*() and VOP_*() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
* Fixup r240246: hwpmc needs to retain the pinning until ASTs are notattilio2012-10-301-1/+6
| | | | | | | | | | | | | executed. This means past the point where userret() is generally executed. Skip the td_pinned check if a callchain tracing is currently happening and add a more robust check to pmc_capture_user_callchain() in order to catch td_pinned leak past ast() in hwpmc case. Reported and tested by: fabient MFC after: 1 week X-MFC: r240246
* Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".trasz2012-10-261-0/+13
| | | | It was implemented by Rudolf Tomori during Google Summer of Code 2012.
* Add a KPI to allow to reserve some amount of space in the numvnodeskib2012-10-141-0/+2
| | | | | | | | | | | | | counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
* Move the checks for td_pinned, td_critnest, TDP_NOFAULTING andattilio2012-09-081-1/+14
| | | | | | | | | | TDP_NOSLEEPING leaking from syscallret() to userret() so that also trap handling is covered. Also, the check on td_locks is not duplicated between the two functions. Reported by: avg Reviewed by: kib MFC after: 1 week
* Move PT_UPDATED_FLUSH() before td_locks check in order to have moreattilio2012-09-081-3/+3
| | | | | | | coverage also in the XEN case. Reviewed by: kib MFC after: 1 week
* userret() already checks for td_locks when INVARIANTS is enabled, soattilio2012-09-081-1/+0
| | | | | | | there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week
* Remove redundant include.pjd2012-06-101-1/+0
| | | | MFC after: 1 month
* Include the associated wait channel message for context switch ktracejhb2012-04-201-2/+2
| | | | | | | records. kdump supports both the old and new messages. Submitted by: Andrey Zonov andrey zonov org MFC after: 1 week
* Add software PMC support.fabient2012-03-281-0/+10
| | | | | | | | | | | | | New kernel events can be added at various location for sampling or counting. This will for example allow easy system profiling whatever the processor is with known tools like pmcstat(8). Simultaneous usage of software PMC and hardware PMC is possible, for example looking at the lock acquire failure, page fault while sampling on instructions. Sponsored by: NETASQ MFC after: 1 month
* Assert that exiting process does not return to usermode.kib2011-10-031-0/+2
| | | | | Reviewed by: avg, jhb MFC after: 1 week
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-2/+2
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Inline the syscallenter() and syscallret(). This reduces the time measuredkib2011-09-111-162/+0
| | | | | | | | by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
* We may split today's CAPABILITIES into CAPABILITY_MODE (which hasjonathan2011-06-291-2/+2
| | | | | | | | | | | | | to do with global namespaces) and CAPABILITIES (which has to do with constraining file descriptors). Just in case, and because it's a better name anyway, let's move CAPABILITIES out of the way. Also, change opt_capabilities.h to opt_capsicum.h; for now, this will only hold CAPABILITY_MODE, but it will probably also hold the new CAPABILITIES (implying constrained file descriptors) in the future. Approved by: rwatson Sponsored by: Google UK Ltd
* Continue introducing Capsicum capability mode support:rwatson2011-03-011-0/+15
| | | | | | | | | | | If a system call wasn't listed in capabilities.conf, return ECAPMODE at syscall entry. Reviewed by: anderson Discussed with: benl, kris, pjd Sponsored by: Google, Inc. Obtained from: Capsicum Project MFC after: 3 months
* Mfp4 CH=177256:bz2011-02-141-0/+11
| | | | | | | | | | | | | | | Catch a set vnet upon return to user space. This usually means return paths with CURVNET_RESTORE() missing. If VNET_DEBUG is turned on we can even tell the function that did the CURVNET_SET() which is really helpful; else we print "N/A". Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb MFC after: 11 days
* Allow debugger to specify that children of the traced process should bekib2011-01-251-2/+2
| | | | | | | | automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks
* Remove extra braces for style(9) (found while cleaning up an old work tree).emaste2010-09-281-2/+1
|
* Call the systrace_probe_func() when the error value.rpaulo2010-08-221-2/+2
| | | | Sponsored by: The FreeBSD Foundation
* Retire td_syscalls now that it is no longer needed.jhb2010-07-151-1/+0
|
* Obey sv_syscallnames bounds in syscallname().kib2010-07-041-2/+4
| | | | Reported and tested by: pho
* Move prototypes for kern_sigtimedwait() and kern_sigprocmask() tojhb2010-06-301-0/+1
| | | | <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
* Count number of threads that enter and leave dynamically registeredkib2010-06-281-0/+4
| | | | | | | | | | syscalls. On the dynamic syscall deregistration, wait until all threads leave the syscall code. This somewhat increases the safety of the loadable modules unloading. Reviewed by: jhb Tested by: pho MFC after: 1 month
* Remove the support for int13 FPU exception reporting on i386. It iskib2010-06-231-21/+0
| | | | | | | | | believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)
* Make DTrace syscall provider work again by including opt_kdtrace.h here.rpaulo2010-06-171-0/+1
|
* Allow to use syscallname(9) outside subr_trap.c.kib2010-05-261-2/+1
| | | | MFC after: 1 month
* Reorganize syscall entry and leave handling.kib2010-05-231-0/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
* Current pselect(3) is implemented in usermode and thus vulnerable tokib2009-10-271-0/+5
| | | | | | | | | | | | | | | | | well-known race condition, which elimination was the reason for the function appearance in first place. If sigmask supplied as argument to pselect() enables a signal, the signal might be delivered before thread called select(2), causing lost wakeup. Reimplement pselect() in kernel, making change of sigmask and sleep atomic. Since signal shall be delivered to the usermode, but sigmask restored, set TDP_OLDMASK and save old mask in td_oldsigmask. The TDP_OLDMASK should be cleared by ast() in case signal was not gelivered during syscall execution. Reviewed by: davidxu Tested by: pho MFC after: 1 month
* Currently, when signal is delivered to the process and there is a threadkib2009-10-111-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | not blocking the signal, signal is placed on the thread sigqueue. If the selected thread is in kernel executing thr_exit() or sigprocmask() syscalls, then signal might be not delivered to usermode for arbitrary amount of time, and for exiting thread it is lost. Put process-directed signals to the process queue unconditionally, selecting the thread to deliver the signal only by the thread returning to usermode, since only then the thread can handle delivery of signal reliably. For exiting thread or thread that has blocked some signals, check whether the newly blocked signal is queued for the process, and try to find a thread to wakeup for delivery, in reschedule_signal(). For exiting thread, assume that all signals are blocked. Change cursig() and postsig() to look both into the thread and process signal queues. When there is a signal that thread returning to usermode could consume, TDF_NEEDSIGCHK flag is not neccessary set now. Do unlocked read of p_siglist and p_pendingcnt to check for queued signals. Note that thread that has a signal unblocked might get spurious wakeup and EINTR from the interruptible system call now, due to the possibility of being selected by reschedule_signals(), while other thread returned to usermode earlier and removed the signal from process queue. This should not cause compliance issues, since the thread has not blocked a signal and thus should be ready to receive it anyway. Reported by: Justin Teller <justin.teller gmail com> Reviewed by: davidxu, jilles MFC after: 1 month
* Add new msleep(9) flag PBDY that shall be specified together withkib2009-07-141-1/+1
| | | | | | | | | | | | PCATCH, to indicate that thread shall not be stopped upon receipt of SIGSTOP until it reaches the kernel->usermode boundary. Also change thread_single(SINGLE_NO_EXIT) to only stop threads at the user boundary unconditionally. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* - Bug fix: prevent a thread from migrating between CPUs between thejkoshy2008-12-131-8/+0
| | | | | | | | | | | | | time it is marked for user space callchain capture in the NMI handler and the time the callchain capture callback runs. - Improve code and control flow clarity by invoking hwpmc(4)'s user space callchain capture callback directly from low-level code. Reviewed by: jhb (kern/subr_trap.c) Testing (various patch revisions): gnn, Fabien Thomas <fabien dot thomas at netasq dot com>, Artem Belevich <artemb at gmail dot com>
* - Forward port flush of page table updates on context switch or userretkmacy2008-10-191-0/+9
| | | | - Forward port vfork XEN hack
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-171-2/+1
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* - Add a new td flag TDF_NEEDSUSPCHK that is set whenever a thread needsjeff2008-03-211-17/+11
| | | | | | | | | | | | | to enter thread_suspend_check(). - Set TDF_ASTPENDING along with TDF_NEEDSUSPCHK so we can move the thread_suspend_check() to ast() rather than userret(). - Check TDF_NEEDSUSPCHK in the sleepq_catch_signals() optimization so that we don't miss a suspend request. If this is set use the expensive signal path. - Set NEEDSUSPCHK when creating a new thread in thr in case the creating thread is due to be suspended as well but has not yet. Reviewed by: davidxu (Authored original patch)
* Remove kernel support for M:N threading.jeff2008-03-121-23/+1
| | | | | | | | While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
* Kernel and hwpmc(4) support for callchain capture.jkoshy2007-12-071-0/+13
| | | | Sponsored by: FreeBSD Foundation and Google Inc.
* A bunch more files that should probably print out a thread namejulian2007-11-141-1/+1
| | | | instead of a process name.
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-171-16/+7
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* - Include opt_sched.h for SCHED_STATS.jeff2007-06-121-0/+1
|
* Commit 14/14 of sched_lock decomposition.jeff2007-06-051-8/+11
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
OpenPOWER on IntegriCloud