FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	MFC 323994: Log signal number passed to PT_STEP requests in KTR_PTRACE traces.stable/10	jhb	2017-10-02	1	-2/+2
\|
*	MFC: r275751	marius	2017-09-21	5	-8/+20
\| \| \| \| \|	Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). A _NEW flag passed to _init_flags() to avoid check for double-init.
*	MFC r278479,278494,278525,278545,278592,279237,280410:	will	2017-08-24	1	-3/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change merges devctl notification for userland coredumps. r278479 (rpaulo): Notify devd(8) when a process crashed. This change implements a notification (via devctl) to userland when the kernel produces coredumps after a process has crashed. devd can then run a specific command to produce a human readable crash report. The command is most usually a helper that runs gdb/lldb commands on the file/coredump pair. It's possible to use this functionality for implementing automatic generation of crash reports. devd(8) will be notified of the full path of the binary that crashed and the full path of the coredump file. r278494 (rpaulo): Sanitise the coredump file names sent to devd. While there, add a sysctl to turn this feature off as requested by kib@. r278525 (rpaulo): Remove a printf and an strlen() from the coredump code. r278545 (rpaulo): Restore the data array in coredump(), but use a different style to calculate the length. r278592 (rpaulo): Remove check against NULL after M_WAITOK. r279237 (kib): Keep a reference on the coredump vnode for vn_fullpath() call. Do it by moving vn_close() after the point where notification is sent. r280410 (rpaulo): Disable coredump_devctl because it could lead to leaking paths to jails. Approved by: re
*	MFC r320077	alc	2017-07-25	1	-138/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change blist_alloc()'s allocation policy from first-fit to next-fit so that disk writes are more likely to be sequential. This change is beneficial on both the solid state and mechanical disks that I've tested. (A similar change in allocation policy was made by DragonFly BSD in 2013 to speed up Poudriere with "stressful memory parameters".) Increase the width of blst_meta_alloc()'s parameter "skip" and the local variables whose values are derived from it to 64 bits. (This matches the width of the field "skip" that is stored in the structure "blist" and passed to blst_meta_alloc().) Eliminate a pointless check for a NULL blist_t. Simplify blst_meta_alloc()'s handling of the ALL-FREE case. Address nearby style errors. MFC r320417 Address the remaining integer overflow issues with the "skip" parameters and "next_skip" variables. The "skip" value in struct blist has long been a 64-bit quantity but various functions have implicitly truncated this value to 32 bits. Now, all arithmetic involving the "skip" value is 64 bits wide. (This should allow us to relax the size limit on a swap device in the swap pager.) Maintain the ability to test this allocator as a user-space application by including <stdbool.h>. Remove an unused variable from blst_radix_print(). MFC r320527 Change blst_leaf_alloc() to handle a cursor argument, and to improve performance. To find in the leaf bitmap all ranges of sufficient length, use a doubling strategy with shift-and-and until each bit still set represents a bit sequence of length 'count', or until the bitmask is zero. In the latter case, update the hint based on the first bit sequence length not found to be available. For example, seeking an interval of length 12, the set bits of the bitmap would represent intervals of length 1, then 2, then 3, then 6, then 12. If no bits are set at the point when each bit represents an interval of length 6, then the hint can be updated to 5 and the search terminated. If long-enough intervals are found, discard those before the cursor. If any remain, use binary search to find the position of the first of them, and allocate that interval.
*	MFC r319905	alc	2017-07-22	1	-46/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce the frequency of hint updates on allocation without incurring additional allocation overhead. Previously, blst_meta_alloc() updated the hint after every successful allocation. However, these "eager" hint updates are of no actual benefit if, instead, the "lazy" hint update at the start of blst_meta_alloc() is generalized to handle all cases where the number of available blocks is less than the requested allocation. Previously, the lazy hint update at the start of blst_meta_alloc() only handled the ALL-FULL case. (I would also note that this change provides consistency between blist_alloc() and blist_fill() in that their hint maintenance is now entirely lazy.) Eliminate unnecessary checks for terminators in blst_meta_alloc() and blst_meta_fill() when handling ALL-FREE meta nodes. Eliminate the field "bl_free" from struct blist. It is redundant. Unless the entire radix tree is a single leaf, the count of free blocks is stored in the root node. Instead, provide a function blist_avail() for obtaining the number of free blocks. In blst_meta_alloc(), perform a sanity check on the allocation once rather than repeating it in a loop over the meta node's children. In blst_leaf_fill(), use the optimized bitcount*() function instead of a loop to count the blocks being allocated. Add or improve several comments. Address some nearby style errors.
*	MFC r315621	alc	2017-07-22	1	-3/+4
\| \| \| \| \| \| \| \| \|	Use IDX_TO_OFF(), not ptoa(), when converting the difference between two vm_pindex_t's into a vm_ooffset_t. The length given to shm_dotruncate() must never be negative. Assert this. Tidy up a comment.
*	MFC r320498	alc	2017-07-22	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	Clear the MAP_WIREFUTURE flag on the vm map in exec_new_vmspace() when it recycles the current vm space. Otherwise, an mlockall(MCL_FUTURE) could still be in effect on the process after an execve(2), which violates the specification for mlockall(2). It's pointless for vm_map_stack() to check the MEMLOCK limit. It will never be asked to wire the stack. Moreover, it doesn't even implement wiring of the stack.
*	MFC r279992,r280149,r280193,r288223,r288484,r321109:	ngie	2017-07-18	1	-41/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r279992 (by ian): Add a new flag, SBUF_INCLUDENUL, and new get/set/clear functions for flags. The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string to be counted in the length of the data. If copying the data using the sbuf_data() and sbuf_len() functions, or if writing it automatically with a drain function, the net effect is that the nulterm byte is copied along with the rest of the data. r280149 (by ian): Update an sbuf assertion to allow for the new SBUF_INCLUDENUL flag. If INCLUDENUL is set and sbuf_finish() has been called, the length has been incremented to count the nulterm byte, and in that case current length is allowed to be equal to buffer size, otherwise it must be less than. Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to be consistant with the style in the rest of this file. r280193 (by ian): The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that. Values smaller than two lead to strange asserts that have nothing to do with the actual problem (in the case of size=0), or to writing beyond the end of the allocated buffer in sbuf_finish() (in the case of size=1). r288223 (by cem): sbuf: Process more than one char at a time Revamp sbuf_put_byte() to sbuf_put_bytes() in the obvious fashion and fixup callers. Add a thin shim around sbuf_put_bytes() with the old ABI to avoid ugly changes to some callers. Obtained from: Dan Sledz r288484 (by phk): Fail the sbuf if vsnprintf(3) fails. r321109: Fix whitespace regression accidentally checked in via ^/head@r280149
*	MFC r307873,r314397,r314399,r314419,r314420,r314533,r316553:	ngie	2017-07-18	1	-12/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r307873 (by marcel): Include <stdarg.h> instead of <machine/stdarg.h> when compiled as part of libsbuf. The former is the standard header, and allows us to compile libsbuf on macOS/linux. r314397 (by scottl): Implement sbuf_prf(), which takes an sbuf and outputs it to stdout in the non-kernel case and to the console+log in the kernel case. For the kernel case it hooks the putbuf() machinery underneath printf(9) so that the buffer is written completely atomically and without a copy into another temporary buffer. This is useful for fixing compound console/log messages that become broken and interleaved when multiple threads are competing for the console. r314399 (by scottl): Add prototype for sbuf_putbuf() r314419 (by jkim): Include stdio.h to fix libsbuf build. r314420 (by scottl): Provide a comment on why stdio.h needs to be included. r314533 (by scottl): Expose the sbuf_putbuf() symbol to libsbuf. There are a few other symbols that are present but not exposed, like get/set/clear flags, not sure if they need to be exposed at this point. r316553: sbuf(3): expose sbuf_{clear,get,set}_flags(3) via libsbuf These functions were added to sbuf(9) in r279992, but never exposed to userspace. Expose them now so they can be used/tested.
*	MFC r281437 (by mjg@):	dchagin	2017-07-15	1	-27/+25
\| \| \| \| \| \|	Replace struct filedesc argument in getsock_cap with struct thread This is is a step towards removal of spurious arguments.
*	MFC r281436 (by mjg@):	dchagin	2017-07-15	8	-19/+17
\| \| \| \| \| \| \| \|	fd: remove filedesc argument from fdclose Just accept a thread instead. This makes it consistent with fdalloc. No functional changes.
*	Regen after r321017.	dchagin	2017-07-15	1	-4/+4
\| \| \| \|	Move the SCTP syscalls to netinet with the rest of the SCTP code.
*	MFC r272823:	dchagin	2017-07-15	2	-494/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout)
*	Temporarily r284696:	dchagin	2017-07-15	1	-8/+0
\| \| \| \| \| \| \|	MFC r284613. When using KTRACE, set a variable to the appropriate value and don't leave it initialized at NULL. Will MFC it after proper MFC of r272823.
*	MFC r281829 (by trasz@):	dchagin	2017-07-15	1	-4/+6
\| \| \| \| \| \|	Modify kern___getcwd() to take max pathlen limit as an additional argument. This will be used for the Linux emulation layer - for Linux, PATH_MAX is 4096 and not 1024.
*	MFC r320619:	kib	2017-07-10	1	-9/+8
\| \| \| \|	Resolve confusion between different error code spaces.
*	MFC r293295:	mjg	2017-07-06	1	-38/+17
\| \| \| \| \| \|	cache: ansify functions and fix some style issues No functional changes.
*	MFC r319699	alc	2017-07-03	1	-20/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocating swap blocks, if the available number of free blocks in a subtree is already zero, then setting the "largest contiguous free block" hint for that subtree to anything other than zero makes no sense. (To be clear, assigning a value to the hint that is too large is not a correctness problem, only a pessimization.) MFC r319755 blist_fill()'s return type is too narrow. blist_fill() accepts a 64-bit quantity as the size of the range to fill, but returns a 32-bit quantity as the number of blocks that were allocated to fill that range. This revision corrects that mismatch. MFC r319793 Remove an unnecessary field from struct blist. (The comment describing what this field represented was also inaccurate.) In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be specified. However, blist_create() was not modified to handle the possibility that a malloc() call failed. Address this omission. Increase the width of the local variable "radix" to 64 bits. This matches the width of the corresponding field in struct blist.
*	MFC r320038:	kib	2017-06-23	1	-7/+8
\| \| \| \|	Style.
*	MFC r319916:	kib	2017-06-20	1	-1/+0
\| \| \| \|	Remove stray return.
*	MFC r318995	alc	2017-06-15	1	-18/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r118390, the swap pager's approach to striping swap allocation over multiple devices was changed. However, swapoff_one() was not fully and correctly converted. In particular, with r118390's introduction of a per- device blist, the maximum swap block size, "dmmax", became irrelevant to swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of- range operations on the per-device blist that were silently ignored by blist_fill(). This change corrects both of these problems with swapoff_one(), which will allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously, swapoff_one() would panic inside of blist_fill() if you increased MAX_PAGEOUT_CLUSTER. MFC r319001 After r118390, the variable "dmmax" was neither the correct strip size nor the correct maximum block size. Moreover, after r318995, it serves no purpose except to provide information to user space through a read- sysctl. This change eliminates the variable "dmmax" but retains the sysctl. It also corrects the value returned by the sysctl. MFC r319604 Halve the memory being internally allocated by the blist allocator. In short, half of the memory that is allocated to implement the radix tree is wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int when we changed "daddr_t" to be a 64-bit (signed) int. (See r96849 and r96851.) MFC r319612 When the function blist_fill() was added to the kernel in r107913, the swap pager used a different scheme for striping the allocation of swap space across multiple devices. And, although blist_fill() was intended to support fill operations with large counts, the old striping scheme never performed a fill larger than the stripe size. Consequently, the misplacement of a sanity check in blst_meta_fill() went undetected. Now, moving forward in time to r118390, a new scheme for striping was introduced that maintained a blist allocator per device, but as noted in r318995, swapoff_one() was not fully and correctly converted to the new scheme. This change completes what was started in r318995 by fixing the underlying bug in blst_meta_fill() that stops swapoff_one() from simply performing a single blist_fill() operation. MFC r319627 Starting in r118390, swaponsomething() began to reserve the blocks at the beginning of a swap area for a disk label. However, neither r118390 nor r118544, which increased the reservation from one to two blocks, correctly accounted for these blocks when updating the variable "swap_pager_avail". This change corrects that error. MFC r319655 Originally, this file could be compiled as a user-space application for testing purposes. However, over the years, various changes to the kernel have broken this feature. This revision applies some fixes to get user- space compilation working again. There are no changes in this revision to code that is used by the kernel.
*	MFC r318243:	kib	2017-05-19	1	-1/+3
\| \| \| \| \| \| \|	Do not wake up sleeping thread in reschedule_signals() if the signal is blocked. The spurious wakeup might result in spurious EINTR. PR: 219228
*	MFC r317845-r317846	brooks	2017-05-15	1	-8/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r317845: Provide a freebsd32 implementation of sigqueue() The previous misuse of sys_sigqueue() was sending random register or stack garbage to 64-bit targets. The freebsd32 implementation preserves the sival_int member of value when signaling a 64-bit process. Document the mixed ABI implementation of union sigval and the incompability of sival_ptr with pointer integrity schemes. Reviewed by: kib, wblock Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10605 r317846: Regen post r317845. MFC with: r317845 Sponsored by: DARPA, AFRL
*	MFC: r317982	marius	2017-05-14	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Also outside of the KOBJOPLOOKUP macro - which in turn is used by the code auto-generated for *.m - kobj_lookup_method(9) is useful; for example in back-ends or base class device drivers in order to determine whether a default method has been overridden. Thus, allow for the kobj_method_t pointer argument - used by KOBJOPLOOKUP in order to update the cache entry - of kobj_lookup_method(9), to be NULL. Actually, that pointer is redundant as it's just set to the same kobj_method_t that the kobj_lookup_method(9) function returns in the first place, but probably it serves to reduce the number of instructions generated for KOBJOPLOOKUP. - For the same reason, move updating kobj_lookup_{hits,misses} (if KOBJ_STATS is defined) from kobj_lookup_method(9) to KOBJOPLOOKUP. As a side-effect, this gets rid of the convoluted approach of always incrementing kobj_lookup_hits in KOBJOPLOOKUP and then in case of a cache miss, decrementing it in kobj_lookup_method(9) again.
*	MFC 313407,313449: Copy ELF machine/flags from binaries to core dumps.	jhb	2017-05-11	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	313407: Copy the e_machine and e_flags fields from the binary into an ELF core dump. In the kernel, cache the machine and flags fields from ELF header to use in the ELF header of a core dump. For gcore, the copy these fields over from the ELF header in the binary. This matters for platforms which encode ABI information in the flags field (such as o32 vs n32 on MIPS). 313449: Trim trailing whitespace (mostly introduced in r313407). Sponsored by: DARPA / AFRL
*	MFC 313999: Consolidate statements to initialize files.	jhb	2017-05-11	1	-29/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the first lines of various generated files from system call tables were generated in two sections. Some of the initialization was done in BEGIN, and the rest was done when the first line was encountered. The main reason for this split before r313564 was that most of the initialization done in the second section depended on the $FreeBSD$ tag extracted from the system call table. Now that the $FreeBSD$ tag is no longer used, consolidate all of the file initialization in the BEGIN section. This change was tested by confirming that the content of generated files did not change.
*	MFC 313564:	jhb	2017-05-10	3	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Drop the "created from" line from files generated by makesyscalls.sh. This information is less useful when the generated files are included in source control along with the source. If needed it can be reconstructed from the $FreeBSD$ tag in the generated file. Removing this information from the generated output permits committing the generated files along with the change to the system call master list without having inconsistent metadata in the generated files. Regenerate the affected files along with the MFC.
*	MFC r315237 (kib): Hide kev_iovlen() definition under #ifdef KTRACE	emaste	2017-04-27	1	-0/+2
\| \| \| \| \| \|	Fixes build of kernel configs without KTRACE. Sponsored by: The FreeBSD Foundation
*	MFC r315960: dtrace sched:::preempt should fire only when there is preemption	avg	2017-04-14	1	-1/+5
\|
*	MFC r315851: move thread switch tracing from mi_switch to sched_switch	avg	2017-04-14	3	-19/+28
\|
*	MFC r316497:	brooks	2017-04-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Correct a kernel stack leak in 32-bit compat when vfc_name is short. Don't zero unused pointer members again. Per discussion with secteam we are not issuing an advisory for this issue as we have no current evidence it leaks exploitable information. Reviewed by: rwatson, glebius, delphij Sponsored by: DARPA, AFRL
*	MFC r315699:	ngie	2017-03-29	1	-1/+2
\| \| \| \| \| \| \| \| \|	Print out name of non-dynamic sysctl in sysctl_remove_oid_locked This will provide a slightly better smoking gun than just stating "can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9) and sysctl_remove_{name,oid}(9) with a non-dynamic (likely static) sysctl.
*	MFC r315412, r314852:	badger	2017-03-25	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r315412: Don't clear p_ptevents on normal SIGKILL delivery The ptrace() user has the option of discarding the signal. In such a case, p_ptevents should not be modified. If the ptrace() user decides to send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events do not have the capability to discard the signal, so continue to clear the mask in that case. r314852: don't stop in issignal() if P_SINGLE_EXIT is set Suppose a traced process is stopped in ptracestop() due to receipt of a SIGSTOP signal, and is awaiting orders from the tracing process on how to handle the signal. Before sending any such orders, the tracing process exits. This should kill the traced process. But suppose a second thread handles the SIGKILL and proceeds to exit1(), calling thread_single(). The first thread will now awaken and will have a chance to check once more if it should go to sleep due to the SIGSTOP. It must not sleep after P_SINGLE_EXIT has been set; this would prevent the SIGKILL from taking effect, leaving a stopped orphan behind after the tracing process dies. Also add new tests for this condition. Sponsored by: Dell EMC
*	MFC r313992, r314075, r314118, r315484:	badger	2017-03-25	5	-98/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r315484: ptrace_test: eliminate assumption about thread scheduling A couple of the ptrace tests make assumptions about which thread in a multithreaded process will run after a halt. This makes the tests less portable across branches, and susceptible to future breakage. Instead, twiddle thread scheduling and priorities to match the tests' expectation. r314118: Actually fix buildworlds other than i386/amd64/sparc64 after r313992 Disable offending test for platforms without a userspace visible breakpoint(). r314075: Fix world build for archs where __builtin_debugtrap() does not work. The offending code was introduced in r313992. r313992: Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Sponsored by: Dell EMC
*	MFC r315453:	kib	2017-03-24	1	-1/+2
\| \| \| \|	When clearing altsigstack settings on exec, do it to the right thread.
*	MFC r315075: trace thread running state when a thread is run for the first time	avg	2017-03-23	2	-0/+8
\|
*	MFC r315074: actually implement proc:::lwp-exit probe	avg	2017-03-23	1	-0/+1
\|
*	MFC r315510	vangyzen	2017-03-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	nanosleep: plug a kernel memory disclosure nanosleep() updates rmtp on EINVAL. In that case, kern_nanosleep() has not updated rmt, so sys_nanosleep() updates the user-space rmtp by copying garbage from its stack frame. This is not only a kernel memory disclosure, it's also not POSIX-compliant. Fix it to update rmtp only on EINTR. Security: possibly Sponsored by: Dell EMC
*	MFC r315155:	kib	2017-03-19	2	-4/+16
\| \| \| \| \| \| \|	Ktracing kevent(2) calls with unusual arguments might leads to an overly large allocation requests. PR: 217435
*	MFC r314996:	mmokhi	2017-03-18	1	-2/+4
\| \| \| \| \| \| \|	Fix NULL pointer dereference and panic with shm file pread/pwrite. PR: 217429 Approved by: dchagin
*	MFC r313733:	badger	2017-03-16	1	-35/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	sleepq_catch_signals: do thread suspension before signal check Since locks are dropped when a thread suspends, it's possible for another thread to deliver a signal to the suspended thread. If the thread awakens from suspension without checking for signals, it may go to sleep despite having a pending signal that should wake it up. Therefore the suspension check is done first, so any signals sent while suspended will be caught in the subsequent signal check. Sponsored by: Dell EMC
*	MFC r314553:	hselasky	2017-03-14	1	-0/+17
\| \| \| \| \| \| \| \|	Implement taskqueue_poll_is_busy() for use by the LinuxKPI. Refer to comment above function for a detailed description. Discussed with: kib @ Sponsored by: Mellanox Technologies
*	MFC r313941:	hselasky	2017-03-14	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure the thread constructor and destructor eventhandlers are called for all threads belonging to a procedure. Currently the first thread in a procedure is kept around as an optimisation step and is never freed. Because the first thread in a procedure is never freed nor allocated, its destructor and constructor callbacks are never called which means per thread structures allocated by dtrace and the Linux emulation layers for example, might be present for threads which don't need these structures. This patch adds a thread construction and destruction call for the first thread in a procedure. Tested: dtrace, linux emulation Reviewed by: kib @ Sponsored by: Mellanox Technologies
*	MFC r312551:	hselasky	2017-03-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for race leading to endless timer interrupts related to configtimer(). During normal operation "state->nextcallopt" will always be less than or equal to "state->nextcall" and checking only "state->nextcallopt" before calling "callout_process()" is sufficient. However when "configtimer()" is called a race might happen requiring both of these binary times to be checked. Short description of race: 1) A configtimer() call will reset both "state->nextcall" and "state->nextcallopt" to the same binary time. 2) If a "callout_reset()" call happens between "configtimer()" and the next "callout_process()" call, "state->nextcallopt" will get updated and "state->nextcall" will remain at the current time. Refer to logic inside cpu_new_callout(). 3) getnextcpuevent() only respects "state->nextcall" and returns this value over and over again, even if it is in the past, until "now >= state->nextcallopt" becomes true. Then these two time variables are corrected by a "callout_process()" call and the situation goes back to normal. The problem manifests itself in different ways. The common factor is the timer process(es) consume all CPU on one or more CPU cores for a long time, blocking other kernel processes from getting execution time. This can be seen by very high interrupt counts as displayed by "vmstat -i \| grep timer" right after boot. When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting this bug apparently increased. Example output from "vmstat -i" before patch: cpu0:timer 7591 69 cpu9:timer 39031773 358089 cpu4:timer 9359 85 cpu3:timer 9100 83 cpu2:timer 9620 88 Example output from "vmstat -i" after patch: cpu0:timer 4242 34 cpu6:timer 5531 44 cpu3:timer 6450 52 cpu1:timer 4545 36 cpu9:timer 7153 58 Before the patch cpu9 in the example above, was spinning in a loop in order to reach 39 million interrupts just a few seconds after bootup. After the patch the timer interrupt counts are more or less consistent. Discussed with: mav @ Reported by: several people Sponsored by: Mellanox Technologies
*	MFC r303464 (by brooks@):	dchagin	2017-03-11	1	-6/+0
\| \| \| \| \| \| \| \|	Don't create pointless backups of generated files in "make sysent". Any sensible workflow will include a revision control system from which to restore the old files if required. In normal usage, developers just have to clean up the mess.
*	MFC r314626	vangyzen	2017-03-10	1	-7/+7
\| \| \| \| \| \|	Fix grammar in some comments in subr_sleepqueue.c While I'm here, remove trailing whitespace.
*	MFC r314562:	kib	2017-03-05	1	-2/+1
\| \| \| \|	Style.
*	MFC r313909:	bdrewery	2017-03-04	1	-0/+2
\| \| \| \|	Fix panic with unlocked vnode to vrecycle().
*	MFC r283291: don't use CALLOUT_MPSAFE with callout_init()	avg	2017-03-04	5	-6/+6
\| \| \| \| \|	The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs.
*	MFC r313730: try to fix RACCT_RSS accounting	avg	2017-02-27	1	-1/+4
\|