summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* MFC r320619:kib2017-07-101-9/+8
| | | | Resolve confusion between different error code spaces.
* Direct commit to fix a mismerge in r320797.markj2017-07-081-7/+0
|
* MFC r311346, r311352, r313756:markj2017-07-081-6/+121
| | | | Add an allocator for KVA for execve arguments.
* MFC r320422:kib2017-07-041-1/+1
| | | | Do not ignore an error from vm_mmap_object().
* MFC r319699alc2017-07-031-20/+25
| | | | | | | | | | | | | | | | | | | | | | | | | When allocating swap blocks, if the available number of free blocks in a subtree is already zero, then setting the "largest contiguous free block" hint for that subtree to anything other than zero makes no sense. (To be clear, assigning a value to the hint that is too large is not a correctness problem, only a pessimization.) MFC r319755 blist_fill()'s return type is too narrow. blist_fill() accepts a 64-bit quantity as the size of the range to fill, but returns a 32-bit quantity as the number of blocks that were allocated to fill that range. This revision corrects that mismatch. MFC r319793 Remove an unnecessary field from struct blist. (The comment describing what this field represented was also inaccurate.) In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be specified. However, blist_create() was not modified to handle the possibility that a malloc() call failed. Address this omission. Increase the width of the local variable "radix" to 64 bits. This matches the width of the corresponding field in struct blist.
* MFC: r319882, r320062, r320070, r320126rmacklem2017-07-031-7/+37
| | | | | | | | | | | | | Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf. By making MAXBCACHEBUF a tunable, it can be increased to allow for larger read/write data sizes for the NFS client. The tunable is limited to MAXPHYS, which is currently 128K. Making MAXPHYS a tunable or increasing its value is being discussed, since it would be nice to support a read/write data size of 1Mbyte for the NFS client when mounting the AmazonEFS file service. Also, define NFS_MAXXDR as the upper bound on XDR overhead in an NFS RPC.
* MFC r319678:araujo2017-07-021-0/+1
| | | | | | | | | | Allow sysctl kern.vm_guest to return bhyve when running under bhyve. Submitted by: Sean Fagan <sef@ixsystems.com> Reviewed by: grehan MFH: 4 weeks. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11090
* MFC r315518alc2017-06-281-4/+6
| | | | | | | | | | | | | | Avoid unnecessary calls to vm_map_protect() in elf_load_section(). Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to elf_map_insert(), it was needlessly enabling execute access on the mapping, and it would later have to call vm_map_protect() to correct the mapping's access rights. Now, instead, elf_load_section() always passes its parameter "prot" to elf_map_insert(). So, elf_load_section() must only call vm_map_protect() if it needs to remove the write access that was temporarily granted to perform a copyout(). Approved by: re (kib)
* MFC r320108:kib2017-06-261-1/+3
| | | | | | | Allow negative aio_offset only for the read and write LIO ops on device nodes. Approved by: re (marius)
* MFC r320038:kib2017-06-231-7/+8
| | | | | | Style. Approved by: re (gjb)
* MFC r320124:markj2017-06-223-9/+13
| | | | | | Fix the !TD_IS_IDLETHREAD(curthread) locking assertions. Approved by: re (kib)
* MFC r319916:kib2017-06-201-1/+0
| | | | | | Remove stray return. Approved by: re (marius)
* MFC r319540alc2017-06-151-2/+2
| | | | | | | The data type returned by vmoff() is too narrow in its range. This could break the transmission of files longer than 4 GB on 32-bit architectures. Approved by: re (gjb)
* MFC r318995alc2017-06-151-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In r118390, the swap pager's approach to striping swap allocation over multiple devices was changed. However, swapoff_one() was not fully and correctly converted. In particular, with r118390's introduction of a per- device blist, the maximum swap block size, "dmmax", became irrelevant to swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of- range operations on the per-device blist that were silently ignored by blist_fill(). This change corrects both of these problems with swapoff_one(), which will allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously, swapoff_one() would panic inside of blist_fill() if you increased MAX_PAGEOUT_CLUSTER. MFC r319001 After r118390, the variable "dmmax" was neither the correct strip size nor the correct maximum block size. Moreover, after r318995, it serves no purpose except to provide information to user space through a read- sysctl. This change eliminates the variable "dmmax" but retains the sysctl. It also corrects the value returned by the sysctl. MFC r319604 Halve the memory being internally allocated by the blist allocator. In short, half of the memory that is allocated to implement the radix tree is wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int when we changed "daddr_t" to be a 64-bit (signed) int. (See r96849 and r96851.) MFC r319612 When the function blist_fill() was added to the kernel in r107913, the swap pager used a different scheme for striping the allocation of swap space across multiple devices. And, although blist_fill() was intended to support fill operations with large counts, the old striping scheme never performed a fill larger than the stripe size. Consequently, the misplacement of a sanity check in blst_meta_fill() went undetected. Now, moving forward in time to r118390, a new scheme for striping was introduced that maintained a blist allocator per device, but as noted in r318995, swapoff_one() was not fully and correctly converted to the new scheme. This change completes what was started in r318995 by fixing the underlying bug in blst_meta_fill() that stops swapoff_one() from simply performing a single blist_fill() operation. MFC r319627 Starting in r118390, swaponsomething() began to reserve the blocks at the beginning of a swap area for a disk label. However, neither r118390 nor r118544, which increased the reservation from one to two blocks, correctly accounted for these blocks when updating the variable "swap_pager_avail". This change corrects that error. MFC r319655 Originally, this file could be compiled as a user-space application for testing purposes. However, over the years, various changes to the kernel have broken this feature. This revision applies some fixes to get user- space compilation working again. There are no changes in this revision to code that is used by the kernel. Approved by: re (kib)
* MFC r318765:allanjude2017-06-113-6/+25
| | | | | | Allow cpuset_{get,set}affinity in capabilities mode Approved by: re (marius)
* MFC r319518:kib2017-06-101-0/+2
| | | | | | | | | | | Ensure that cached struct thread does not keep spurious td_su reference on an UFS mount point. MFC r319519: Clean possible td_su reference on the struct mount being unmounted as the last step of ffs_unmount(). Approved by: re (gjb)
* MFC r319167:mjg2017-06-011-4/+4
| | | | mtx: fix whitespace damage in _mtx_trylock_flags_
* MFC r318476, r318478:markj2017-05-251-1/+1
| | | | Fix up some kern_yield() usages.
* MFC r318191:markj2017-05-251-2/+2
| | | | Let ptracestop() suspend threads sleeping in an SBDRY section.
* move p_sigqueue to the end of struct procbadger2017-05-231-6/+6
| | | | | | | | | | | | | | In order to preserve KBI in stable branches, replace the existing p_sigqueue slot with padding and move the expanded (as of r315949) p_sigqueue to the end of the struct. This is a repeat of r317529 (which concerned td_sigqueue in struct thread) for p_sigqueue in struct proc. Virtualbox modules (and possibly others) are affected without this fix. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D10843
* MFC r308474, r308691, r309203, r309365, r309703, r309898, r310720,markj2017-05-232-7/+4
| | | | | r308489, r308706: Add PQ_LAUNDRY and remove PG_CACHED pages.
* MFC r318243:kib2017-05-191-1/+3
| | | | | | | Do not wake up sleeping thread in reschedule_signals() if the signal is blocked. The spurious wakeup might result in spurious EINTR. PR: 219228
* MFC r317348:trasz2017-05-181-0/+2
| | | | Make it possible to terminate "show lockedbufs" by pressing "q".
* MFC r316941:trasz2017-05-181-4/+14
| | | | | Don't try to write out bufs that have already failed with ENXIO. This fixes some panics after disconnecting mounted disks.
* MFC r317784:mjg2017-05-161-5/+2
| | | | | | | cache: stop holding the ncneg_hot lock across purging Only non-hot entries are purged so the lock is not needed in the first place. This saves one lock/unlock pair.
* MFC r315685: tighten buffer bounds in imgact_binmisc_populate_interpemaste2017-05-151-1/+1
| | | | | | | | | | | | We must ensure there's space for the terminating null in the temporary buffer in imgact_binmisc_populate_interp(). Note that there's no buffer overflow here because xbe->xbe_interpreter's length and null termination is checked in imgact_binmisc_add_entry() before imgact_binmisc_populate_interp() is called. However, the latter should correctly enforce its own bounds. Sponsored by: The FreeBSD Foundation
* MFC: r317982marius2017-05-141-10/+2
| | | | | | | | | | | | | | | | | | - Also outside of the KOBJOPLOOKUP macro - which in turn is used by the code auto-generated for *.m - kobj_lookup_method(9) is useful; for example in back-ends or base class device drivers in order to determine whether a default method has been overridden. Thus, allow for the kobj_method_t pointer argument - used by KOBJOPLOOKUP in order to update the cache entry - of kobj_lookup_method(9), to be NULL. Actually, that pointer is redundant as it's just set to the same kobj_method_t that the kobj_lookup_method(9) function returns in the first place, but probably it serves to reduce the number of instructions generated for KOBJOPLOOKUP. - For the same reason, move updating kobj_lookup_{hits,misses} (if KOBJ_STATS is defined) from kobj_lookup_method(9) to KOBJOPLOOKUP. As a side-effect, this gets rid of the convoluted approach of always incrementing kobj_lookup_hits in KOBJOPLOOKUP and then in case of a cache miss, decrementing it in kobj_lookup_method(9) again.
* MFC r316874: restore ability to shutdown(2) datagram sockets.sobomax2017-05-141-5/+20
|
* MFC r317845-r317846brooks2017-05-121-8/+18
| | | | | | | | | | | | | | | | | | | | | | r317845: Provide a freebsd32 implementation of sigqueue() The previous misuse of sys_sigqueue() was sending random register or stack garbage to 64-bit targets. The freebsd32 implementation preserves the sival_int member of value when signaling a 64-bit process. Document the mixed ABI implementation of union sigval and the incompability of sival_ptr with pointer integrity schemes. Reviewed by: kib, wblock Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10605 r317846: Regen post r317845. MFC with: r317845 Sponsored by: DARPA, AFRL
* MFC 313407,313449: Copy ELF machine/flags from binaries to core dumps.jhb2017-05-112-6/+6
| | | | | | | | | | | | | | | | | 313407: Copy the e_machine and e_flags fields from the binary into an ELF core dump. In the kernel, cache the machine and flags fields from ELF header to use in the ELF header of a core dump. For gcore, the copy these fields over from the ELF header in the binary. This matters for platforms which encode ABI information in the flags field (such as o32 vs n32 on MIPS). 313449: Trim trailing whitespace (mostly introduced in r313407). Sponsored by: DARPA / AFRL
* MFC r317523:kib2017-05-111-0/+51
| | | | Add asserts to verify stability of struct proc and struct thread layouts.
* MFC 313999: Consolidate statements to initialize files.jhb2017-05-111-30/+26
| | | | | | | | | | | | | | Previously, the first lines of various generated files from system call tables were generated in two sections. Some of the initialization was done in BEGIN, and the rest was done when the first line was encountered. The main reason for this split before r313564 was that most of the initialization done in the second section depended on the $FreeBSD$ tag extracted from the system call table. Now that the $FreeBSD$ tag is no longer used, consolidate all of the file initialization in the BEGIN section. This change was tested by confirming that the content of generated files did not change.
* MFC 315323: Use UMA_ALIGN_PTR instead of sizeof(void *) for zone alignment.jhb2017-05-111-1/+1
| | | | | | | uma_zcreate()'s alignment argument is supposed to be sizeof(foo) - 1, and uma.h provides a set of helper macros for common types. Passing sizeof(void *) results in all of the members being misaligned triggering unaligned access faults on certain architectures (notably MIPS).
* MFC 313564:jhb2017-05-103-12/+6
| | | | | | | | | | | | | Drop the "created from" line from files generated by makesyscalls.sh. This information is less useful when the generated files are included in source control along with the source. If needed it can be reconstructed from the $FreeBSD$ tag in the generated file. Removing this information from the generated output permits committing the generated files along with the change to the system call master list without having inconsistent metadata in the generated files. Regenerate the affected files along with the MFC.
* MFC r315526vangyzen2017-05-015-33/+147
| | | | | | | | | | | | | | | | | | | | | | Add clock_nanosleep() Add a clock_nanosleep() syscall, as specified by POSIX. Make nanosleep() a wrapper around it. Attach the clock_nanosleep test from NetBSD. Adjust it for the FreeBSD behavior of updating rmtp only when interrupted by a signal. I believe this to be POSIX-compliant, since POSIX mentions the rmtp parameter only in the paragraph about EINTR. This is also what Linux does. (NetBSD updates rmtp unconditionally.) Copy the whole nanosleep.2 man page from NetBSD because it is complete and closely resembles the POSIX description. Edit, polish, and reword it a bit, being sure to keep any relevant text from the FreeBSD page. Regenerate syscall files. Relnotes: yes Sponsored by: Dell EMC
* MFC r316770:ae2017-04-211-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear h/w csum flags on mbuf handled by UDP. When checksums of received IP and UDP header already checked, UDP uses sbappendaddr_locked() to pass received data to the socket. sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum offloading, mbuf contains csum_data and csum_flags that were calculated for already stripped headers. Some NICs support only limited checksums offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains some value that UDP/TCP should use for pseudo header checksum calculation. When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with filled csum_flags and csum_data, that were calculated for outer headers. When L2TP header is stripped, a packet that was tunneled goes to the IP layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and csum_data, the UDP/TCP checksum check fails for this packet. Reported by: Irina Liakh <spell at itl ua> Tested by: Irina Liakh <spell at itl ua> MFC r316822,316823: Rework r316770 to make it protocol independent and general, like we do for streaming sockets. And do more cleanup in the sbappendaddr_locked_internal() to prevent leak information from existing mbuf to the one, that will be possible created later by netgraph. Suggested by: glebius Tested by: Irina Liakh <spell at itl ua>
* MFC r303863:smh2017-04-141-738/+8
| | | | | | Move IPv4 & IPv6 specific jail functions to netinet and netinet6 files. Sponsored by: Multiplay
* MFC r315960: dtrace sched:::preempt should fire only when there is preemptionavg2017-04-141-1/+5
|
* MFC r315851: move thread switch tracing from mi_switch to sched_switchavg2017-04-143-19/+28
|
* MFC r316528:kib2017-04-121-8/+10
| | | | Add V_VMIO flag for vinvalbuf(9).
* MFC r315289:markj2017-04-111-2/+6
| | | | When draining a callout, don't clear CALLOUT_ACTIVE while it is running.
* Improvements for the brand detection and prioritization.kib2017-04-061-7/+20
| | | | | | | | | | | | | | | | | | | MFC r315701 (by ed): Set the interpreter path to /nonexistent. MFC r315749: Adjust r314851 to not require every brand to specify interpreter path. MFC r315753: Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only matches static binaries. MFC r315754: Update r315753 with the proper flag name. MFC r316211: A followup to r315749, two more places where brand->interp_path was accessed unconditionally.
* MFC r315860:ed2017-04-061-1/+2
| | | | | | | | | Don't require the presence of the compat_3_brand. The existing ELF image activator requires the brandinfo to provide such a string unconditionally, even if the executable format in question doesn't use this type of branding. Skip matching when it's a null pointer.
* MFC r316497:brooks2017-04-051-2/+1
| | | | | | | | | | | | Correct a kernel stack leak in 32-bit compat when vfc_name is short. Don't zero unused pointer members again. Per discussion with secteam we are not issuing an advisory for this issue as we have no current evidence it leaks exploitable information. Reviewed by: rwatson, glebius, delphij Sponsored by: DARPA, AFRL
* Merge r315910:glebius2017-04-031-4/+3
| | | | | | | | | | Make sendfile(2) more robust against file change. This fixes a possible crash when the file shrinks. This also fixes sendfile(2) not sending more data in a case when the file grows, and the request is open-ended or specifies a size that is greater than old file size. PR: 217789 Reviewed by: gallatin
* MFC r315699:ngie2017-03-291-1/+2
| | | | | | | | | Print out name of non-dynamic sysctl in sysctl_remove_oid_locked This will provide a slightly better smoking gun than just stating "can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9) and sysctl_remove_{name,oid}(9) with a non-dynamic (likely static) sysctl.
* MFC r315280 r315287vangyzen2017-03-293-17/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the RTC is adjusted, reevaluate absolute sleep times based on the RTC POSIX 2008 says this about clock_settime(2): If the value of the CLOCK_REALTIME clock is set via clock_settime(), the new value of the clock shall be used to determine the time of expiration for absolute time services based upon the CLOCK_REALTIME clock. This applies to the time at which armed absolute timers expire. If the absolute time requested at the invocation of such a time service is before the new value of the clock, the time service shall expire immediately as if the clock had reached the requested time normally. Setting the value of the CLOCK_REALTIME clock via clock_settime() shall have no effect on threads that are blocked waiting for a relative time service based upon this clock, including the nanosleep() function; nor on the expiration of relative timers based upon this clock. Consequently, these time services shall expire when the requested relative interval elapses, independently of the new or old value of the clock. When the real-time clock is adjusted, such as by clock_settime(3), wake any threads sleeping until an absolute real-clock time. Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions will set that field to zero and return zero to tell the caller to reevaluate its sleep duration based on the new value of the clock. At present, this affects the following functions: pthread_cond_timedwait(3) pthread_mutex_timedlock(3) pthread_rwlock_timedrdlock(3) pthread_rwlock_timedwrlock(3) sem_timedwait(3) sem_clockwait_np(3) I'm working on adding clock_nanosleep(2), which will also be affected. Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de> Relnotes: yes Sponsored by: Dell EMC
* MFC r315281:kib2017-03-282-3/+3
| | | | | | | | | | Use atop() instead of OFF_TO_IDX() for convertion of addresses or addresses offsets, as intended. MFC r315580 (by alc): Simplify the logic for clipping the range returned by the pager to fit within the map entry. Use atop() rather than OFF_TO_IDX() on addresses.
* MFC r315412, r314852:badger2017-03-251-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | r315412: Don't clear p_ptevents on normal SIGKILL delivery The ptrace() user has the option of discarding the signal. In such a case, p_ptevents should not be modified. If the ptrace() user decides to send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events do not have the capability to discard the signal, so continue to clear the mask in that case. r314852: don't stop in issignal() if P_SINGLE_EXIT is set Suppose a traced process is stopped in ptracestop() due to receipt of a SIGSTOP signal, and is awaiting orders from the tracing process on how to handle the signal. Before sending any such orders, the tracing process exits. This should kill the traced process. But suppose a second thread handles the SIGKILL and proceeds to exit1(), calling thread_single(). The first thread will now awaken and will have a chance to check once more if it should go to sleep due to the SIGSTOP. It must not sleep after P_SINGLE_EXIT has been set; this would prevent the SIGKILL from taking effect, leaving a stopped orphan behind after the tracing process dies. Also add new tests for this condition. Sponsored by: Dell EMC
* MFC r313992, r314075, r314118, r315484:badger2017-03-255-98/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r315484: ptrace_test: eliminate assumption about thread scheduling A couple of the ptrace tests make assumptions about which thread in a multithreaded process will run after a halt. This makes the tests less portable across branches, and susceptible to future breakage. Instead, twiddle thread scheduling and priorities to match the tests' expectation. r314118: Actually fix buildworlds other than i386/amd64/sparc64 after r313992 Disable offending test for platforms without a userspace visible breakpoint(). r314075: Fix world build for archs where __builtin_debugtrap() does not work. The offending code was introduced in r313992. r313992: Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Sponsored by: Dell EMC
OpenPOWER on IntegriCloud