summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* MFC r315960: dtrace sched:::preempt should fire only when there is preemptionavg2017-04-141-1/+5
|
* MFC r315851: move thread switch tracing from mi_switch to sched_switchavg2017-04-143-19/+28
|
* MFC r316497:brooks2017-04-051-2/+1
| | | | | | | | | | | | Correct a kernel stack leak in 32-bit compat when vfc_name is short. Don't zero unused pointer members again. Per discussion with secteam we are not issuing an advisory for this issue as we have no current evidence it leaks exploitable information. Reviewed by: rwatson, glebius, delphij Sponsored by: DARPA, AFRL
* MFC r315699:ngie2017-03-291-1/+2
| | | | | | | | | Print out name of non-dynamic sysctl in sysctl_remove_oid_locked This will provide a slightly better smoking gun than just stating "can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9) and sysctl_remove_{name,oid}(9) with a non-dynamic (likely static) sysctl.
* MFC r315412, r314852:badger2017-03-251-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | r315412: Don't clear p_ptevents on normal SIGKILL delivery The ptrace() user has the option of discarding the signal. In such a case, p_ptevents should not be modified. If the ptrace() user decides to send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events do not have the capability to discard the signal, so continue to clear the mask in that case. r314852: don't stop in issignal() if P_SINGLE_EXIT is set Suppose a traced process is stopped in ptracestop() due to receipt of a SIGSTOP signal, and is awaiting orders from the tracing process on how to handle the signal. Before sending any such orders, the tracing process exits. This should kill the traced process. But suppose a second thread handles the SIGKILL and proceeds to exit1(), calling thread_single(). The first thread will now awaken and will have a chance to check once more if it should go to sleep due to the SIGSTOP. It must not sleep after P_SINGLE_EXIT has been set; this would prevent the SIGKILL from taking effect, leaving a stopped orphan behind after the tracing process dies. Also add new tests for this condition. Sponsored by: Dell EMC
* MFC r313992, r314075, r314118, r315484:badger2017-03-255-98/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r315484: ptrace_test: eliminate assumption about thread scheduling A couple of the ptrace tests make assumptions about which thread in a multithreaded process will run after a halt. This makes the tests less portable across branches, and susceptible to future breakage. Instead, twiddle thread scheduling and priorities to match the tests' expectation. r314118: Actually fix buildworlds other than i386/amd64/sparc64 after r313992 Disable offending test for platforms without a userspace visible breakpoint(). r314075: Fix world build for archs where __builtin_debugtrap() does not work. The offending code was introduced in r313992. r313992: Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Sponsored by: Dell EMC
* MFC r315453:kib2017-03-241-1/+2
| | | | When clearing altsigstack settings on exec, do it to the right thread.
* MFC r315075: trace thread running state when a thread is run for the first timeavg2017-03-232-0/+8
|
* MFC r315074: actually implement proc:::lwp-exit probeavg2017-03-231-0/+1
|
* MFC r315510vangyzen2017-03-211-1/+1
| | | | | | | | | | | | | nanosleep: plug a kernel memory disclosure nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep() has not updated rmt, so sys_nanosleep() updates the user-space rmtp by copying garbage from its stack frame. This is not only a kernel memory disclosure, it's also not POSIX-compliant. Fix it to update rmtp only on EINTR. Security: possibly Sponsored by: Dell EMC
* MFC r315155:kib2017-03-192-4/+16
| | | | | | | Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests. PR: 217435
* MFC r314996:mmokhi2017-03-181-2/+4
| | | | | | | Fix NULL pointer dereference and panic with shm file pread/pwrite. PR: 217429 Approved by: dchagin
* MFC r313733:badger2017-03-161-35/+43
| | | | | | | | | | | | | sleepq_catch_signals: do thread suspension before signal check Since locks are dropped when a thread suspends, it's possible for another thread to deliver a signal to the suspended thread. If the thread awakens from suspension without checking for signals, it may go to sleep despite having a pending signal that should wake it up. Therefore the suspension check is done first, so any signals sent while suspended will be caught in the subsequent signal check. Sponsored by: Dell EMC
* MFC r314553:hselasky2017-03-141-0/+17
| | | | | | | | Implement taskqueue_poll_is_busy() for use by the LinuxKPI. Refer to comment above function for a detailed description. Discussed with: kib @ Sponsored by: Mellanox Technologies
* MFC r313941:hselasky2017-03-141-0/+9
| | | | | | | | | | | | | | | | | | Make sure the thread constructor and destructor eventhandlers are called for all threads belonging to a procedure. Currently the first thread in a procedure is kept around as an optimisation step and is never freed. Because the first thread in a procedure is never freed nor allocated, its destructor and constructor callbacks are never called which means per thread structures allocated by dtrace and the Linux emulation layers for example, might be present for threads which don't need these structures. This patch adds a thread construction and destruction call for the first thread in a procedure. Tested: dtrace, linux emulation Reviewed by: kib @ Sponsored by: Mellanox Technologies
* MFC r312551:hselasky2017-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix for race leading to endless timer interrupts related to configtimer(). During normal operation "state->nextcallopt" will always be less than or equal to "state->nextcall" and checking only "state->nextcallopt" before calling "callout_process()" is sufficient. However when "configtimer()" is called a race might happen requiring both of these binary times to be checked. Short description of race: 1) A configtimer() call will reset both "state->nextcall" and "state->nextcallopt" to the same binary time. 2) If a "callout_reset()" call happens between "configtimer()" and the next "callout_process()" call, "state->nextcallopt" will get updated and "state->nextcall" will remain at the current time. Refer to logic inside cpu_new_callout(). 3) getnextcpuevent() only respects "state->nextcall" and returns this value over and over again, even if it is in the past, until "now >= state->nextcallopt" becomes true. Then these two time variables are corrected by a "callout_process()" call and the situation goes back to normal. The problem manifests itself in different ways. The common factor is the timer process(es) consume all CPU on one or more CPU cores for a long time, blocking other kernel processes from getting execution time. This can be seen by very high interrupt counts as displayed by "vmstat -i | grep timer" right after boot. When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting this bug apparently increased. Example output from "vmstat -i" before patch: cpu0:timer 7591 69 cpu9:timer 39031773 358089 cpu4:timer 9359 85 cpu3:timer 9100 83 cpu2:timer 9620 88 Example output from "vmstat -i" after patch: cpu0:timer 4242 34 cpu6:timer 5531 44 cpu3:timer 6450 52 cpu1:timer 4545 36 cpu9:timer 7153 58 Before the patch cpu9 in the example above, was spinning in a loop in order to reach 39 million interrupts just a few seconds after bootup. After the patch the timer interrupt counts are more or less consistent. Discussed with: mav @ Reported by: several people Sponsored by: Mellanox Technologies
* MFC r303464 (by brooks@):dchagin2017-03-111-6/+0
| | | | | | | | Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess.
* MFC r314626vangyzen2017-03-101-7/+7
| | | | | | Fix grammar in some comments in subr_sleepqueue.c While I'm here, remove trailing whitespace.
* MFC r314562:kib2017-03-051-2/+1
| | | | Style.
* MFC r313909:bdrewery2017-03-041-0/+2
| | | | Fix panic with unlocked vnode to vrecycle().
* MFC r283291: don't use CALLOUT_MPSAFE with callout_init()avg2017-03-045-6/+6
| | | | | The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs.
* MFC r313730: try to fix RACCT_RSS accountingavg2017-02-271-1/+4
|
* MFC r313496:kib2017-02-241-12/+19
| | | | Increase a chance of devfs_close() calling d_close cdevsw method.
* MFC r312991: put very expensive sanity checks of advisory locks under DIAGNOSTICavg2017-02-141-2/+2
| | | | Sponsored by: Panzura
* MFC r310096: reaper: Make REAPER_KILL_SUBTREE actually work.jilles2017-02-051-1/+1
|
* MFC r312647:kib2017-01-291-0/+15
| | | | | Add comments explaining unobvious td_critnest adjustments in critical_exit().
* MFC r312532: don't abort writing of a core dump after EFAULTavg2017-01-261-1/+32
| | | | | | Note that this change substantially differs from the change in head because of an unmerged earlier change that probably can not be merged for POLA reasons.
* MFC r312426: fix a thread preemption regression in schedulers introducedavg2017-01-232-4/+4
| | | | in r270423
* MFC r311963: Remove writability requirement for single-mbuf, contiguous-rpokala2017-01-191-1/+1
| | | | | | | | | | range m_pulldown() m_pulldown() only needs to determine if a mbuf is writable if it is going to copy data into the data region of an existing mbuf. It does this to create a contiguous data region in a single mbuf from multiple mbufs in the chain. If the requested memory region is already contiguous and nothing needs to change, the mbuf does not need to be writeable.
* MFC r312113:ngie2017-01-171-6/+6
| | | | Clean up trailing whitespace
* MFC r311447:kib2017-01-121-2/+2
| | | | Some style fixes for getfstat(2)-related code.
* MFC r311113:kib2017-01-091-6/+4
| | | | | There is no need to use temporary statfs buffer for fsid obliteration and prison enforcement. Do it on the caller buffer directly.
* MFC r311111:kib2017-01-091-1/+1
| | | | Style.
* MFC r311108:kib2017-01-091-65/+41
| | | | Move common code from kern_statfs() and kern_fstatfs() into a new helper.
* MFC r310615:kib2017-01-091-11/+2
| | | | Change knlist_destroy() to assertion.
* MFC r310613:kib2017-01-021-11/+7
| | | | Style.
* MFC r310554:kib2017-01-011-24/+28
| | | | Some optimizations for kqueue timers.
* MFC r310552:kib2017-01-011-5/+5
| | | | Some style.
* MFC r285706,r303562,r303563,r303584,r303643,r303652,r303655,r303707:mjg2016-12-314-64/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (by markj) Don't increment the spin count until after the first attempt to acquire a rwlock read lock. Otherwise the lockstat:::rw-spin probe will fire spuriously. == rwlock: s/READER/WRITER/ in wlock lockstat annotation == sx: increment spin_cnt before cpu_spinwait in xlock The change is a no-op only done for consistency with the rest of the file. == locks: change sleep_cnt and spin_cnt types to u_int Both variables are uint64_t, but they only count spins or sleeps. All reasonable values which we can get here comfortably hit in 32-bit range. == Implement trivial backoff for locking primitives. All current spinning loops retry an atomic op the first chance they get, which leads to performance degradation under load. One classic solution to the problem consists of delaying the test to an extent. This implementation has a trivial linear increment and a random factor for each attempt. For simplicity, this first thouch implementation only modifies spinning loops where the lock owner is running. spin mutexes and thread lock were not modified. Current parameters are autotuned on boot based on mp_cpus. Autotune factors are very conservative and are subject to change later. == locks: fix up ifdef guards introduced in r303643 Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct define. == locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case == locks: fix sx compilation on mips after r303643 The kernel.h header is required for the SYSINIT macro, which apparently was present on amd64 by accident.
* MFC r301157:mjg2016-12-314-9/+25
| | | | | | | | | | Microoptimize locking primitives by avoiding unnecessary atomic ops. Inline version of primitives do an atomic op and if it fails they fallback to actual primitives, which immediately retry the atomic op. The obvious optimisation is to check if the lock is free and only then proceed to do an atomic op.
* MFC r309886:kib2016-12-261-0/+5
| | | | | When a zombie gets reparented due to the parent exit, send SIGCHLD to the reaper.
* MFC r310302:kib2016-12-261-2/+6
| | | | | | Do not clear KN_INFLUX when not owning influx state. PR: 214923
* MFC r310159:kib2016-12-231-7/+5
| | | | Switch from stdatomic.h to atomic.h for kernel.
* MFC r296775 (by gibbs):kib2016-12-161-17/+42
| | | | | | Provide high precision conversion from ns,us,ms -> sbintime in kevent. Tested by: ian
* MFC r309676vangyzen2016-12-151-1/+9
| | | | | | | | | | | | | Export the whole thread name in kinfo_proc kinfo_proc::ki_tdname is three characters shorter than thread::td_name. Add a ki_moretdname field for these three extra characters. Add the new field to kinfo_proc32, as well. Update all in-tree consumers to read the new field and assemble the full name, except for lldb's HostThreadFreeBSD.cpp, which I will handle separately. Bump __FreeBSD_version. Sponsored by: Dell EMC
* MFC r309460vangyzen2016-12-151-2/+5
| | | | | | | | | | | | | | | thr_set_name(): silently truncate the given name as needed Instead of failing with ENAMETOOLONG, which is swallowed by pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1 bytes. This is more likely what the user wants, and saves the caller from truncating it before the call (which was the only recourse). The man page changes were not merged because thr_set_name.2 does not exist on stable/10. Sponsored by: Dell EMC
* MFC r308350:markj2016-12-121-2/+2
| | | | Fix WITNESS hints for pagequeue locks.
* MFC 308564: Don't place threads on the run queue after waking up other CPUs.jhb2016-12-021-49/+13
| | | | | | | | | | | | | | The other CPU might resume and see a still-empty runq and go back to sleep before sched_add() adds the thread to the runq. This results in a lost wakeup and a potential hang if the system is otherwise completely idle. The race originated due to a micro-optimization (my fault) in 4BSD in that it avoided putting a thread on the run queue if the scheduler was going to preempt to the new thread. To avoid complexity while fixing this race, just drop this optimization. 4BSD now always sets the "owepreempt" flag when a preemption is warranted and defers the actual preemption to the thread_unlock of the caller the same as ULE.
* MFH: r306306julian2016-12-021-2/+2
| | | | | | Give the user a clue as to which process hit maxfiles. Sponsored by: Panzura
* MFC r308618:kib2016-11-271-0/+6
| | | | | Provide simple mutual exclusion between mount point update and unmount. In the update path in ffs_mount(), drop vfs_busy() reference around namei().
OpenPOWER on IntegriCloud