| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Resolve confusion between different error code spaces.
|
| |
|
|
|
|
| |
Add an allocator for KVA for execve arguments.
|
|
|
|
| |
Do not ignore an error from vm_mmap_object().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When allocating swap blocks, if the available number of free blocks in a
subtree is already zero, then setting the "largest contiguous free block"
hint for that subtree to anything other than zero makes no sense. (To be
clear, assigning a value to the hint that is too large is not a correctness
problem, only a pessimization.)
MFC r319755
blist_fill()'s return type is too narrow. blist_fill() accepts a 64-bit
quantity as the size of the range to fill, but returns a 32-bit quantity
as the number of blocks that were allocated to fill that range. This
revision corrects that mismatch.
MFC r319793
Remove an unnecessary field from struct blist. (The comment describing
what this field represented was also inaccurate.)
In r178792, blist_create() grew a malloc flag, allowing M_NOWAIT to be
specified. However, blist_create() was not modified to handle the
possibility that a malloc() call failed. Address this omission.
Increase the width of the local variable "radix" to 64 bits. This
matches the width of the corresponding field in struct blist.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf.
By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.
Also, define NFS_MAXXDR as the upper bound on XDR overhead in an NFS RPC.
|
|
|
|
|
|
|
|
|
|
| |
Allow sysctl kern.vm_guest to return bhyve when running under bhyve.
Submitted by: Sean Fagan <sef@ixsystems.com>
Reviewed by: grehan
MFH: 4 weeks.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11090
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid unnecessary calls to vm_map_protect() in elf_load_section().
Typically, when elf_load_section() unconditionally passed VM_PROT_ALL to
elf_map_insert(), it was needlessly enabling execute access on the
mapping, and it would later have to call vm_map_protect() to correct the
mapping's access rights. Now, instead, elf_load_section() always passes
its parameter "prot" to elf_map_insert(). So, elf_load_section() must
only call vm_map_protect() if it needs to remove the write access that
was temporarily granted to perform a copyout().
Approved by: re (kib)
|
|
|
|
|
|
|
| |
Allow negative aio_offset only for the read and write LIO ops on
device nodes.
Approved by: re (marius)
|
|
|
|
|
|
| |
Style.
Approved by: re (gjb)
|
|
|
|
|
|
| |
Fix the !TD_IS_IDLETHREAD(curthread) locking assertions.
Approved by: re (kib)
|
|
|
|
|
|
| |
Remove stray return.
Approved by: re (marius)
|
|
|
|
|
|
|
| |
The data type returned by vmoff() is too narrow in its range. This could
break the transmission of files longer than 4 GB on 32-bit architectures.
Approved by: re (gjb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In r118390, the swap pager's approach to striping swap allocation over
multiple devices was changed. However, swapoff_one() was not fully and
correctly converted. In particular, with r118390's introduction of a per-
device blist, the maximum swap block size, "dmmax", became irrelevant to
swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of-
range operations on the per-device blist that were silently ignored by
blist_fill().
This change corrects both of these problems with swapoff_one(), which will
allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously,
swapoff_one() would panic inside of blist_fill() if you increased
MAX_PAGEOUT_CLUSTER.
MFC r319001
After r118390, the variable "dmmax" was neither the correct strip size
nor the correct maximum block size. Moreover, after r318995, it serves
no purpose except to provide information to user space through a read-
sysctl.
This change eliminates the variable "dmmax" but retains the sysctl. It
also corrects the value returned by the sysctl.
MFC r319604
Halve the memory being internally allocated by the blist allocator. In
short, half of the memory that is allocated to implement the radix tree is
wasted because we did not change "u_daddr_t" to be a 64-bit unsigned int
when we changed "daddr_t" to be a 64-bit (signed) int. (See r96849 and
r96851.)
MFC r319612
When the function blist_fill() was added to the kernel in r107913, the swap
pager used a different scheme for striping the allocation of swap space
across multiple devices. And, although blist_fill() was intended to support
fill operations with large counts, the old striping scheme never performed a
fill larger than the stripe size. Consequently, the misplacement of a
sanity check in blst_meta_fill() went undetected. Now, moving forward in
time to r118390, a new scheme for striping was introduced that maintained a
blist allocator per device, but as noted in r318995, swapoff_one() was not
fully and correctly converted to the new scheme. This change completes what
was started in r318995 by fixing the underlying bug in blst_meta_fill() that
stops swapoff_one() from simply performing a single blist_fill() operation.
MFC r319627
Starting in r118390, swaponsomething() began to reserve the blocks at the
beginning of a swap area for a disk label. However, neither r118390 nor
r118544, which increased the reservation from one to two blocks, correctly
accounted for these blocks when updating the variable "swap_pager_avail".
This change corrects that error.
MFC r319655
Originally, this file could be compiled as a user-space application for
testing purposes. However, over the years, various changes to the kernel
have broken this feature. This revision applies some fixes to get user-
space compilation working again. There are no changes in this revision
to code that is used by the kernel.
Approved by: re (kib)
|
|
|
|
|
|
| |
Allow cpuset_{get,set}affinity in capabilities mode
Approved by: re (marius)
|
|
|
|
|
|
|
|
|
|
|
| |
Ensure that cached struct thread does not keep spurious td_su
reference on an UFS mount point.
MFC r319519:
Clean possible td_su reference on the struct mount being unmounted as
the last step of ffs_unmount().
Approved by: re (gjb)
|
|
|
|
| |
mtx: fix whitespace damage in _mtx_trylock_flags_
|
|
|
|
| |
Fix up some kern_yield() usages.
|
|
|
|
| |
Let ptracestop() suspend threads sleeping in an SBDRY section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to preserve KBI in stable branches, replace the existing
p_sigqueue slot with padding and move the expanded (as of r315949)
p_sigqueue to the end of the struct.
This is a repeat of r317529 (which concerned td_sigqueue in struct
thread) for p_sigqueue in struct proc.
Virtualbox modules (and possibly others) are affected without this fix.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D10843
|
|
|
|
|
| |
r308489, r308706:
Add PQ_LAUNDRY and remove PG_CACHED pages.
|
|
|
|
|
|
|
| |
Do not wake up sleeping thread in reschedule_signals() if the signal
is blocked. The spurious wakeup might result in spurious EINTR.
PR: 219228
|
|
|
|
| |
Make it possible to terminate "show lockedbufs" by pressing "q".
|
|
|
|
|
| |
Don't try to write out bufs that have already failed with ENXIO.
This fixes some panics after disconnecting mounted disks.
|
|
|
|
|
|
|
| |
cache: stop holding the ncneg_hot lock across purging
Only non-hot entries are purged so the lock is not needed in the first place.
This saves one lock/unlock pair.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must ensure there's space for the terminating null in the temporary
buffer in imgact_binmisc_populate_interp().
Note that there's no buffer overflow here because xbe->xbe_interpreter's
length and null termination is checked in imgact_binmisc_add_entry()
before imgact_binmisc_populate_interp() is called. However, the latter
should correctly enforce its own bounds.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Also outside of the KOBJOPLOOKUP macro - which in turn is used by
the code auto-generated for *.m - kobj_lookup_method(9) is useful;
for example in back-ends or base class device drivers in order to
determine whether a default method has been overridden. Thus, allow
for the kobj_method_t pointer argument - used by KOBJOPLOOKUP in
order to update the cache entry - of kobj_lookup_method(9), to be
NULL. Actually, that pointer is redundant as it's just set to the
same kobj_method_t that the kobj_lookup_method(9) function returns
in the first place, but probably it serves to reduce the number of
instructions generated for KOBJOPLOOKUP.
- For the same reason, move updating kobj_lookup_{hits,misses} (if
KOBJ_STATS is defined) from kobj_lookup_method(9) to KOBJOPLOOKUP.
As a side-effect, this gets rid of the convoluted approach of always
incrementing kobj_lookup_hits in KOBJOPLOOKUP and then in case of
a cache miss, decrementing it in kobj_lookup_method(9) again.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r317845:
Provide a freebsd32 implementation of sigqueue()
The previous misuse of sys_sigqueue() was sending random register or
stack garbage to 64-bit targets. The freebsd32 implementation preserves
the sival_int member of value when signaling a 64-bit process.
Document the mixed ABI implementation of union sigval and the
incompability of sival_ptr with pointer integrity schemes.
Reviewed by: kib, wblock
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10605
r317846:
Regen post r317845.
MFC with: r317845
Sponsored by: DARPA, AFRL
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
313407:
Copy the e_machine and e_flags fields from the binary into an ELF core dump.
In the kernel, cache the machine and flags fields from ELF header to use in
the ELF header of a core dump. For gcore, the copy these fields over from
the ELF header in the binary.
This matters for platforms which encode ABI information in the flags field
(such as o32 vs n32 on MIPS).
313449:
Trim trailing whitespace (mostly introduced in r313407).
Sponsored by: DARPA / AFRL
|
|
|
|
| |
Add asserts to verify stability of struct proc and struct thread layouts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the first lines of various generated files from system call
tables were generated in two sections. Some of the initialization was
done in BEGIN, and the rest was done when the first line was encountered.
The main reason for this split before r313564 was that most of the
initialization done in the second section depended on the $FreeBSD$ tag
extracted from the system call table. Now that the $FreeBSD$ tag is no
longer used, consolidate all of the file initialization in the BEGIN
section.
This change was tested by confirming that the content of generated files
did not change.
|
|
|
|
|
|
|
| |
uma_zcreate()'s alignment argument is supposed to be sizeof(foo) - 1,
and uma.h provides a set of helper macros for common types. Passing
sizeof(void *) results in all of the members being misaligned triggering
unaligned access faults on certain architectures (notably MIPS).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Drop the "created from" line from files generated by makesyscalls.sh.
This information is less useful when the generated files are included in
source control along with the source. If needed it can be reconstructed
from the $FreeBSD$ tag in the generated file. Removing this information
from the generated output permits committing the generated files along
with the change to the system call master list without having inconsistent
metadata in the generated files.
Regenerate the affected files along with the MFC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add clock_nanosleep()
Add a clock_nanosleep() syscall, as specified by POSIX.
Make nanosleep() a wrapper around it.
Attach the clock_nanosleep test from NetBSD. Adjust it for the
FreeBSD behavior of updating rmtp only when interrupted by a signal.
I believe this to be POSIX-compliant, since POSIX mentions the rmtp
parameter only in the paragraph about EINTR. This is also what
Linux does. (NetBSD updates rmtp unconditionally.)
Copy the whole nanosleep.2 man page from NetBSD because it is complete
and closely resembles the POSIX description. Edit, polish, and reword it
a bit, being sure to keep any relevant text from the FreeBSD page.
Regenerate syscall files.
Relnotes: yes
Sponsored by: Dell EMC
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clear h/w csum flags on mbuf handled by UDP.
When checksums of received IP and UDP header already checked, UDP uses
sbappendaddr_locked() to pass received data to the socket.
sbappendaddr_locked() uses given mbuf as is, and if NIC supports checksum
offloading, mbuf contains csum_data and csum_flags that were calculated
for already stripped headers. Some NICs support only limited checksums
offloading and do not use CSUM_PSEUDO_HDR flag, and csum_data contains
some value that UDP/TCP should use for pseudo header checksum calculation.
When L2TP is used for tunneling with mpd5, ng_ksocket receives mbuf with
filled csum_flags and csum_data, that were calculated for outer headers.
When L2TP header is stripped, a packet that was tunneled goes to the IP
layer and due to presence of csum_flags (without CSUM_PSEUDO_HDR) and
csum_data, the UDP/TCP checksum check fails for this packet.
Reported by: Irina Liakh <spell at itl ua>
Tested by: Irina Liakh <spell at itl ua>
MFC r316822,316823:
Rework r316770 to make it protocol independent and general, like we
do for streaming sockets.
And do more cleanup in the sbappendaddr_locked_internal() to prevent
leak information from existing mbuf to the one, that will be possible
created later by netgraph.
Suggested by: glebius
Tested by: Irina Liakh <spell at itl ua>
|
|
|
|
|
|
| |
Move IPv4 & IPv6 specific jail functions to netinet and netinet6 files.
Sponsored by: Multiplay
|
| |
|
| |
|
|
|
|
| |
Add V_VMIO flag for vinvalbuf(9).
|
|
|
|
| |
When draining a callout, don't clear CALLOUT_ACTIVE while it is running.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFC r315701 (by ed):
Set the interpreter path to /nonexistent.
MFC r315749:
Adjust r314851 to not require every brand to specify interpreter path.
MFC r315753:
Add a flag BI_BRAND_ONLY_STATIC to specify that the brand only
matches static binaries.
MFC r315754:
Update r315753 with the proper flag name.
MFC r316211:
A followup to r315749, two more places where brand->interp_path was
accessed unconditionally.
|
|
|
|
|
|
|
|
|
| |
Don't require the presence of the compat_3_brand.
The existing ELF image activator requires the brandinfo to provide such
a string unconditionally, even if the executable format in question
doesn't use this type of branding. Skip matching when it's a null
pointer.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correct a kernel stack leak in 32-bit compat when vfc_name is short.
Don't zero unused pointer members again.
Per discussion with secteam we are not issuing an advisory for this
issue as we have no current evidence it leaks exploitable information.
Reviewed by: rwatson, glebius, delphij
Sponsored by: DARPA, AFRL
|
|
|
|
|
|
|
|
|
|
| |
Make sendfile(2) more robust against file change. This fixes a possible
crash when the file shrinks. This also fixes sendfile(2) not sending more
data in a case when the file grows, and the request is open-ended or
specifies a size that is greater than old file size.
PR: 217789
Reviewed by: gallatin
|
|
|
|
|
|
|
|
|
| |
Print out name of non-dynamic sysctl in sysctl_remove_oid_locked
This will provide a slightly better smoking gun than just stating
"can't remove non-dynamic nodes!" when calling sysctl_ctx_free(9)
and sysctl_remove_{name,oid}(9) with a non-dynamic (likely
static) sysctl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the RTC is adjusted, reevaluate absolute sleep times based on the RTC
POSIX 2008 says this about clock_settime(2):
If the value of the CLOCK_REALTIME clock is set via clock_settime(),
the new value of the clock shall be used to determine the time
of expiration for absolute time services based upon the
CLOCK_REALTIME clock. This applies to the time at which armed
absolute timers expire. If the absolute time requested at the
invocation of such a time service is before the new value of
the clock, the time service shall expire immediately as if the
clock had reached the requested time normally.
Setting the value of the CLOCK_REALTIME clock via clock_settime()
shall have no effect on threads that are blocked waiting for
a relative time service based upon this clock, including the
nanosleep() function; nor on the expiration of relative timers
based upon this clock. Consequently, these time services shall
expire when the requested relative interval elapses, independently
of the new or old value of the clock.
When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.
At present, this affects the following functions:
pthread_cond_timedwait(3)
pthread_mutex_timedlock(3)
pthread_rwlock_timedrdlock(3)
pthread_rwlock_timedwrlock(3)
sem_timedwait(3)
sem_clockwait_np(3)
I'm working on adding clock_nanosleep(2), which will also be affected.
Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Relnotes: yes
Sponsored by: Dell EMC
|
|
|
|
|
|
|
|
|
|
| |
Use atop() instead of OFF_TO_IDX() for convertion of addresses or
addresses offsets, as intended.
MFC r315580 (by alc):
Simplify the logic for clipping the range returned by the pager to fit
within the map entry.
Use atop() rather than OFF_TO_IDX() on addresses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r315412:
Don't clear p_ptevents on normal SIGKILL delivery
The ptrace() user has the option of discarding the signal. In such a
case, p_ptevents should not be modified. If the ptrace() user decides to
send a SIGKILL, ptevents will be cleared in ptracestop(). procfs events
do not have the capability to discard the signal, so continue to clear
the mask in that case.
r314852:
don't stop in issignal() if P_SINGLE_EXIT is set
Suppose a traced process is stopped in ptracestop() due to receipt of a
SIGSTOP signal, and is awaiting orders from the tracing process on how
to handle the signal. Before sending any such orders, the tracing
process exits. This should kill the traced process. But suppose a second
thread handles the SIGKILL and proceeds to exit1(), calling
thread_single(). The first thread will now awaken and will have a chance
to check once more if it should go to sleep due to the SIGSTOP. It must
not sleep after P_SINGLE_EXIT has been set; this would prevent the
SIGKILL from taking effect, leaving a stopped orphan behind after the
tracing process dies.
Also add new tests for this condition.
Sponsored by: Dell EMC
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r315484:
ptrace_test: eliminate assumption about thread scheduling
A couple of the ptrace tests make assumptions about which thread in a
multithreaded process will run after a halt. This makes the tests less
portable across branches, and susceptible to future breakage. Instead,
twiddle thread scheduling and priorities to match the tests'
expectation.
r314118:
Actually fix buildworlds other than i386/amd64/sparc64 after r313992
Disable offending test for platforms without a userspace visible
breakpoint().
r314075:
Fix world build for archs where __builtin_debugtrap() does not work.
The offending code was introduced in r313992.
r313992:
Defer ptracestop() signals that cannot be delivered immediately
When a thread is stopped in ptracestop(), the ptrace(2) user may request
a signal be delivered upon resumption of the thread. Heretofore, those signals
were discarded unless ptracestop()'s caller was issignal(). Fix this by
modifying ptracestop() to queue up signals requested by the ptrace user that
will be delivered when possible. Take special care when the signal is SIGKILL
(usually generated from a PT_KILL request); no new stop events should be
triggered after a PT_KILL.
Add a number of tests for the new functionality. Several tests were authored
by jhb.
PR: 212607
Sponsored by: Dell EMC
|