summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'origin/stable/10' into develLuiz Otavio O Souza2016-03-301-1/+2
|\
| * MFC r297037:pfg2016-03-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | aio_qphysio(): Avoid uninitialized pointer read on error. For the !unmap case it may happen that pbuf gets called unreferenced when vm_fault_quick_hold_pages() fails. Initialize it so it doesn't cause trouble. CID: 1352776 Reviewed by: jhb
* | Backport patch from D5698Renato Botelho2016-03-222-2/+4
|/ | | | | | This is an attempt to fix Chelsio cxl driver mbuf leak https://reviews.freebsd.org/D5698
* MFC r296467:kib2016-03-211-41/+92
| | | | | Convert all panics from the link_elf_obj kernel linker for object files format into printfs and errors to caller.
* MFC r256613, r256862: MFprojects/camlock r254763:mav2016-03-201-4/+21
| | | | | | Move tq_enqueue() call out of the queue lock for known handlers (actually I have found no others in the base system). This reduces queue lock hold time and congestion spinning under active multithreaded enqueuing.
* MFC r256612: MFprojects/camlock r254685:mav2016-03-201-6/+1
| | | | Remove TQ_FLAGS_PENDING flag, softly duplicating queue emptiness status.
* MFC r277759 (by jhb@)np2016-03-201-0/+3
| | | | | | | | | Fix a couple of panics when detaching from a cxgbe/cxl interface that was never brought up: - Allow NULL to be passed to sglist_free(). - Don't try to stop an interface that was never fully initialized. PR: 208136
* MFC r296320:kib2016-03-152-6/+7
| | | | | | | Adjust _callout_stop_safe() return value for the subr_sleepqueue.c needs when migrating callout was blocked, but running one was not. PR: 200992
* MFC r295391:kib2016-03-121-12/+7
| | | | Remove the assert which outlived its usefulness.
* MFC r295489:kib2016-03-122-54/+27
| | | | | Remove useless checks for NULL before calling free(9), in the kernel elf linkers.
* MFC r295488:kib2016-03-121-6/+3
| | | | | Finish r173600. There is no need to test a condition if both cases result in the same value.
* MFC r296419 (by kib):dim2016-03-071-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | In the link_elf_obj.c, handle sections of type SHT_AMD64_UNWIND same as SHT_PROGBITS. This is needed after the clang 3.8 import, which generates that type for .eh_frame section, which had SHT_PROGBITS type before. Reported by: Nikolai Lifanov <lifanov@mail.lifanov.com> PR: 207729 Tested by: dim (previous version) Sponsored by: The FreeBSD Foundation MFC r296428: Since kernel modules can now contain sections of type SHT_AMD64_UNWIND, the boot loader should not skip over these anymore while loading images. Otherwise the kernel can still panic when it doesn't find the .eh_frame section belonging to the .rela.eh_frame section. Unfortunately this will require installing boot loaders from sys/boot before attempting to boot with a new kernel. Reviewed by: kib
* MFH: 285685araujo2016-02-241-0/+16
| | | | | | | | | | | Add support to the jail framework to be able to mount linsysfs(5) and linprocfs(5). PR: 207179 Requested by: thomas@gibfest.dk Reviewed by: jamie, bapt Approved by: re (gjb) Sponsored by: gandi.net Differential Revision: https://reviews.freebsd.org/D5390
* In preparation for 10.3-RELEASE, temporarily revert the MFC of r291244marius2016-02-231-242/+80
| | | | | | | | | | | done as part of r292895 on stable/10 as that change causes hangs with ZFS and the cause on at least amd64 so far not understood. Discussed with: kib For further information see: https://lists.freebsd.org/pipermail/freebsd-stable/2016-February/084045.html PR: 207281 Approved by: re (gjb)
* MFC: r264565marius2016-02-211-1/+76
| | | | | | | | | | | | | | | | | | | Do not set M_BESTFIT if a strategy has already been provided. This fixes problems when using M_FIRSTFIT. MFC: r280805 Add four new DDB commands to display vmem(9) statistics. In particular, such DDB commands were added: show vmem <addr> show all vmem show vmemdump <addr> show all vmemdump As possible usage, that allows to see KVA usage and fragmentation. Approved by: re (gjb)
* MFC 295418,295419:jhb2016-02-162-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix hangs or panics when misbehaved kernel threads return from their main function. 295418: Mark proc0 as a kernel process via the P_KTHREAD flag. All other kernel processes have this flag set and all threads in proc0 (including thread0) have the similar TDP_KTHREAD flag set. 295419: Call kthread_exit() rather than kproc_exit() for a premature kthread exit. Kernel threads (and processes) are supposed to call kthread_exit() (or kproc_exit()) to terminate. However, the kernel includes a fallback in fork_exit() to force a kthread exit if a kernel thread's "main" routine returns. This fallback was added back when the kernel only had processes and was not updated to call kthread_exit() instead of kproc_exit() when threads were added to the kernel. This mistake was particularly exciting when the errant thread belonged to proc0. Due to the missing P_KTHREAD flag the fallback did not kick in and instead tried to return to userland via whatever garbage was in the trapframe. With P_KTHREAD set it tried to terminate proc0 resulting in other amusements. PR: 204999 Approved by: re (glebius)
* MFC r294598:kib2016-02-141-5/+10
| | | | | | In tty_dealloc(), clear the queues. Approved by: re (marius)
* MFC r294596:kib2016-02-141-2/+3
| | | | | | | Limit the accesses to file' f_advice member to VREG vnodes only. Recheck that f_advice is not NULL after lock is taken. Approved by: re (marius)
* MFC 287442,287537,288944:jhb2016-02-104-21/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix corruption of coredumps due to procstat notes changing size during coredump generation. The changes in r287442 required some reworking since the 'fo_fill_kinfo' file op does not exist in stable/10. 287442: Detect badly behaved coredump note helpers Coredump notes depend on being able to invoke dump routines twice; once in a dry-run mode to get the size of the note, and another to actually emit the note to the corefile. When a note helper emits a different length section the second time around than the length it requested the first time, the kernel produces a corrupt coredump. NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' fd table via vn_fullpath. As vnodes may move around during dump, this is racy. So: - Detect badly behaved notes in putnote() and pad underfilled notes. - Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to exercise the NT_PROCSTAT_FILES corruption. It simply picks random lengths to expand or truncate paths to in fo_fill_kinfo_vnode(). - Add a sysctl, kern.coredump_pack_fileinfo, to allow users to disable kinfo packing for PROCSTAT_FILES notes. This should avoid both FILES note corruption and truncation, even if filenames change, at the cost of about 1 kiB in padding bloat per open fd. Document the new sysctl in core.5. - Fix note_procstat_files to self-limit in the 2nd pass. Since sometimes this will result in a short write, pad up to our advertised size. This addresses note corruption, at the risk of sometimes truncating the last several fd info entries. - Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the zero padding. 287537: Follow-up to r287442: Move sysctl to compiled-once file Avoid duplicate sysctl nodes. 288944: Fix core corruption caused by race in note_procstat_vmmap This fix is spiritually similar to r287442 and was discovered thanks to the KASSERT added in that revision. NT_PROCSTAT_VMMAP output length, when packing kinfo structs, is tied to the length of filenames corresponding to vnodes in the process' vm map via vn_fullpath. As vnodes may move during coredump, this is racy. We do not remove the race, only prevent it from causing coredump corruption. - Add a sysctl, kern.coredump_pack_vmmapinfo, to allow users to disable kinfo packing for PROCSTAT_VMMAP notes. This avoids VMMAP corruption and truncation, even if names change, at the cost of up to PATH_MAX bytes per mapped object. The new sysctl is documented in core.5. - Fix note_procstat_vmmap to self-limit in the second pass. This addresses corruption, at the cost of sometimes producing a truncated result. - Fix PROCSTAT_VMMAP consumers libutil (and libprocstat, via copy-paste) to grok the new zero padding. Approved by: re (gjb)
* MFC r294732:kib2016-02-081-5/+7
| | | | | | Minor fixes for ddb tty-related commands. Approved by: re (gjb)
* MFC r294735:kib2016-02-081-3/+6
| | | | | | | Don't allow opening the callout device when the callin device is already open (in disguise as the console device). Approved by: re (gjb)
* MFC r295277:kib2016-02-071-1/+20
| | | | | | | When matching brand to the ELF binary by notes, try to find a brand with interpreter name exactly matching one wanted by the binary. Approved by: re (delphij)
* MFC 278320,278336,278830,285621:jhb2016-02-012-0/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add devctl(8): a utility for manipulating new-bus devices. Note that this version does not include the 'suspend' and 'resume' commands present in HEAD as those depend on larger changes to the suspend and resume code in the kernel. 278320: Add a new device control utility for new-bus devices called devctl. This allows the user to request administrative changes to individual devices such as attach or detaching drivers or disabling and re-enabling devices. - Add a new /dev/devctl2 character device which uses ioctls for device requests. The ioctls use a common 'struct devreq' which is somewhat similar to 'struct ifreq'. - The ioctls identify the device to operate on via a string. This string can either by the device's name, or it can be a bus-specific address. (For unattached devices, a bus address is the only way to locate a device.) Bus drivers register an eventhandler to claim unrecognized device names that the driver recognizes as a valid address. Two buses currently support addresses: ACPI recognizes any device in the ACPI namespace via its full path starting with "\" and the PCI bus driver recognizes an address specification of 'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector strings supported by pciconf). - To make it easier to cut and paste, change the PnP location string in the PCI bus driver to output a full PCI selector string rather than 'slot=<slot> function=<func>'. - Add a devctl(3) interface in libdevctl which provides a wrapper around the ioctls and is the preferred interface for other userland code. - Add a devctl(8) program which is a simple wrapper around the requests supported by devctl(3). - Add a resource_unset_value() function that can be used to remove a hint from the kernel environment. This is used to clear a hint.<driver>.<unit>.disabled hint when re-enabling a boot-time disabled device. 278336: Unbreak the build (memchr is explicitly required by devctl(9) after r278320) 278830: install the man page... 285621: Fix formatting. Approved by: re (marius)
* MFC r293349:kib2016-01-281-52/+47
| | | | Convert tty common code to use make_dev_s().
* MFC r293346:kib2016-01-281-23/+75
| | | | Provide yet another KPI for cdev creation, make_dev_s(9).
* MFC: r294362, r294414, r294753marius2016-01-271-35/+67
| | | | | | | | | | | | - Fix tty_drain() and, thus, TIOCDRAIN of the current tty(4) incarnation to actually wait until the TX FIFOs of UARTs have be drained before returning. This is done by bringing the equivalent of the TS_BUSY flag found in the previous implementation back in an ABI-preserving way. Reported and tested by: Patrick Powell - Make the code consistent with itself style-wise and bring it closer to style(9). - Mark unused arguments as such. - Make the ttystates table const.
* MFC r293458:markj2016-01-261-7/+26
| | | | Prevent cv_waiters wraparound.
* MFC r293045, r293046:ian2016-01-241-3/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable. Also, revert accidental change that snuck into r293045.
* MFC r289618, r290316:ian2016-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix printf format to allow for bus_size_t not being u_long on all platforms. Fix an alignment check that is wrong in half the busdma implementations. This will enable the elimination of a workaround in the USB driver that artifically allocates buffers twice as big as they need to be (which actually saves memory for very small buffers on the buggy platforms). When deciding how to allocate a dma buffer, armv4, armv6, mips, and x86/iommu all correctly check for the tag alignment <= maxsize as enabling simple uma/malloc based allocation. Powerpc, sparc64, x86/bounce, and arm64/bounce were all checking for alignment < maxsize; on those platforms when alignment was equal to the max size it would fall back to page-based allocators even for very small buffers. This change makes all platforms use the <= check. It should be noted that on all platforms other than arm[v6] and mips, this check is relying on undocumented behavior in malloc(9) that if you allocate a block of a given size it will be aligned to the next larger power-of-2 boundary. There is nothing in the malloc(9) man page that makes that explicit promise (but the busdma code has been relying on this behavior all along so I guess it works). Arm and mips code uses the allocator in kern/subr_busdma_buffalloc.c, which does explicitly implement this promise about size and alignment. Other platforms probably should switch to the aligned allocator.
* MFC 292892:jhb2016-01-231-8/+3
| | | | | | | Call kern_thr_exit() instead of duplicating it. This code is missing the racct_subr() call from kern_thr_exit() and would require further code duplication in future changes.
* MFC 289769,289822,290143,290144:jhb2016-01-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename remaining linux32 symbols from linux_* to linux32_*. 289769: Rename remaining linux32 symbols such as linux_sysent[] and linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with linux64.ko. While here, add support for linux64 binaries to systrace. - Update NOPROTO entries in amd64/linux/syscalls.master to match the main table to fix systrace build. - Add a special case for union l_semun arguments to the systrace generation. - The systrace_linux32 module now only builds the systrace_linux32.ko. module on amd64. - Add a new systrace_linux module that builds on both i386 and amd64. For i386 it builds the existing systrace_linux.ko. For amd64 it builds a systrace_linux.ko for 64-bit binaries. 289822: Fix build for the KTR-enabled kernels. 290143: Fix build with DEBUG defined. 290144: Update for LINUX32 rename. The assembler didn't complain about undefined symbols but just used 0 after the rename.
* MFC 290728:jhb2016-01-183-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Export various helper variables describing the layout and size of certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future.
* MFC r293613:dchagin2016-01-162-0/+2
| | | | | Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it.
* o Fix SCTP ICMPv6 error message vulnerability. [SA-16:01.sctp]glebius2016-01-141-2/+1
| | | | | | | | | | | | | o Fix Linux compatibility layer incorrect futex handling. [SA-16:03.linux] o Fix Linux compatibility layer setgroups(2) system call. [SA-16:04.linux] o Fix TCP MD5 signature denial of service. [SA-16:05.tcp] o Fix insecure default bsnmpd.conf permissions. [SA-16:06.bsnmpd] Security: FreeBSD-SA-16:01.sctp, CVE-2016-1879 Security: FreeBSD-SA-16:03.linux, CVE-2016-1880 Security: FreeBSD-SA-16:04.linux, CVE-2016-1881 Security: FreeBSD-SA-16:05.tcp, CVE-2016-1882 Security: FreeBSD-SA-16:06.bsnmpd, CVE-2015-5677
* MFC: r292943, r292960marius2016-01-131-3/+0
| | | | | | | | | | | | | | | | | | | | - (Ab)use udivx for dividing the u_int pc_cpuid when implementing CPU_ISSET(), CPU_SET() etc. in sparc64 asm. This approach has the benefit of not clobbering %y, allowing to revert r222827 and partially r222828. - In r222828, CATR() already was changed to use the equivalent of PCPU_GET(cpuid) instead of the MD module ID for KTR_MASK, so belatedly also catch up with KTR_CPU and the C side of ktr(9). Originally, in r203838 CATR() was moved away from directly reading the module ID or equivalent as that became impractical with other CPU types than USI/II supported. With r222828 in place, per-CPU data generally is set up soon enough, though, that employing PCPU things in ktr(9) also for use during early stages works. - Unfortunately, an exception to the latter is the ktr(9) use in pmap_bootstrap(), which actually is run so early that even checking for bootverbose being set via the loader doesn't work. Consequently, replace the ktr(9) use in pmap_bootstrap() with OF_printf(9) and put it under #ifdef DIAGNOSTIC instead.
* Hide the "unmount of /dev failed (BUSY)" warning at shutdown or reboot,trasz2016-01-121-1/+1
| | | | | | | | | | | introduced with r293742, just like it was hidden before that commit. This is a direct commit to 10-STABLE; this special case is not needed in 11-CURRENT, because devfs supports forced unmounts there. The forced unmount could be MFC-ed, but there are some LORs at shutdown, and I have a weird feelings about it. Sponsored by: The FreeBSD Foundation
* MFC r287964:trasz2016-01-122-21/+159
| | | | | | | | | | | | | Kernel part of reroot support - a way to change rootfs without reboot. Note that the mountlist manipulations are somewhat fragile, and not very pretty. The reason for this is to avoid changing vfs_mountroot(), which is (obviously) rather mission-critical, but not very well documented, and thus hard to test properly. It might be possible to rework it to use its own simple root mount mechanism instead of vfs_mountroot(). Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D2698
* MFC r287107:trasz2016-01-123-28/+37
| | | | | | | | | Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467
* MFC r289111:trasz2016-01-111-6/+2
| | | | | | Provide better debug message on kernel module name clash. Sponsored by: The FreeBSD Foundation
* MFC r283440:dchagin2016-01-091-9/+26
| | | | | | | | | | | | For future use in the Linuxulator: 1. Add a kern_kqueue() counterpart for kqueue() with flags parameter. 2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it. Suggested by: mjg [2]
* MFC r283382:dchagin2016-01-093-0/+11
| | | | | In preparation for switching linuxulator to the use the native 1:1 threads add a hook for cleaning thread resources before the thread die.
* MFC r283377:dchagin2016-01-091-24/+78
| | | | | | | | In preparation for switching linuxulator to the use the native 1:1 threads split sys_sched_getparam(), sys_sched_setparam(), sys_sched_getscheduler(), sys_sched_setscheduler() to their kern_* counterparts and add targettd parameter to allow specify the target thread directly by callee.
* MFC r283374:dchagin2016-01-091-3/+16
| | | | | | | | | | | In preparation for switching linuxulator to the use the native 1:1 threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval(). Add a kern_sched_rr_get_interval() counterpart which takes a targettd parameter to allow specify target thread directly by callee (new Linuxulator). Linuxulator temporarily uses first thread in proc. Move linux_sched_rr_get_interval() to the MI part.
* MFC r283373:dchagin2016-01-091-10/+19
| | | | | | In preparation for switching linuxulator to the use the native 1:1 threads introduce kern_thr_alloc() which will be used later in the linux_clone().
* MFC r283372:dchagin2016-01-091-4/+10
| | | | | | | In preparation for switching linuxulator to the use the native 1:1 threads split sys_thr_exit() up into sys_thr_exit() and kern_thr_exit(). Move Where the second will be used in linux_exit() system call later.
* Regen for r293474.dchagin2016-01-093-2/+66
|
* MFC r277610 (by jillies):dchagin2016-01-093-3/+132
| | | | Add futimens and utimensat system calls.
* To facillitate an upcoming Linuxulator merging partiallydchagin2016-01-099-49/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r275121 (by kib). Only merge the syntax changes from r275121, PROC_*LOCK() macros still lock the same proc spinlock. The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). Discussed with: kib
* MFC r292749:kib2016-01-091-1/+3
| | | | | Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary.
* MFC r292676:jtl2016-01-071-0/+5
| | | | | Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf.
OpenPOWER on IntegriCloud