| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.
Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.
For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.
In collaboration with: pho
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
| |
lookup code that dotdot lookups shall override any shared lock
requests with the exclusive one. The flag is useful for filesystems
which sometimes need to upgrade shared lock to exclusive inside the
VOP_LOOKUP or later, which cannot be done safely for dotdot, due to
dvp also locked and causing LOR.
In collaboration with: pho
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
| |
TDP_NOSLEEPING leaking from syscallret() to userret() so that also
trap handling is covered. Also, the check on td_locks is not duplicated
between the two functions.
Reported by: avg
Reviewed by: kib
MFC after: 1 week
|
|
|
|
|
|
|
| |
coverage also in the XEN case.
Reviewed by: kib
MFC after: 1 week
|
|
|
|
|
|
|
| |
there is no need to check if Giant is acquired after it.
Reviewed by: kib
MFC after: 1 week
|
|
|
|
|
|
|
| |
so that setsockopt() and getsockopt() work on them.
This makes 'tools/regression/sockets/unix_cmsg -t dgram'
more successful.
|
| |
|
|
|
|
|
| |
Suggested by: mdf
Approved by: adrian (menthor)
|
|
|
|
|
|
| |
Approved by: bschmidt (while mentor offline)
Pointed by: gcooper
Pointy hat to: ray
|
|
|
|
|
|
| |
Pointed out by: avg
Approved by: kib (mentor)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
0 - loader hints in environment only;
1 - static hints only
2 - fallback mode (Dynamic KENV with fallback to kernel environment)
Add kern.hintmode write handler, accept only value 2. That will switch
static KENV to dynamic. So it will be possible to change device hints.
Approved by: adrian (mentor)
|
|
|
|
|
|
| |
and kern.sgrowsiz sysctls writable.
Approved by: kib (mentor)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MSG_WAITALL is set, and it is possible to do the entire receive
operation at once if we block (resid <= hiwat). Actually it might make
the recv(2) with MSG_WAITALL flag get stuck when there is enough space
in the receiver buffer to satisfy the request but not enough to open
the window closed previously due to the buffer being full.
The issue can be reproduced using the following scenario:
On the sender side do 2 send(2) requests:
1) data of size much smaller than SOBUF_SIZE (e.g. SOBUF_SIZE / 10);
2) data of size equal to SOBUF_SIZE.
On the receiver side do 2 recv(2) requests with MSG_WAITALL flag set:
1) recv() data of SOBUF_SIZE / 10 size;
2) recv() data of SOBUF_SIZE size;
We totally fill the receiver buffer with one SOBUF_SIZE/10 size request
and partial SOBUF_SIZE request. When the first request is processed we
get SOBUF_SIZE/10 free space. It is just enough to receive the rest of
bytes for the second request, and soreceive_generic() blocks in the
part that is a subject of this change waiting for the rest. But the
window was closed when the buffer was filled and to avoid silly window
syndrome it opens only when available space is larger than sb_hiwat/4
or maxseg. So it is stuck and pending data is only sent via TCP window
probes.
Discussed with: kib (long ago)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
| |
check it for MT_CONTROL type too, otherwise the assertion
"m->m_type == MT_DATA" below may be triggered by the following scenario:
- the sender sends some data (MT_DATA) and then a file descriptor
(MT_CONTROL);
- the receiver calls recv(2) with a MSG_WAITALL asking for data larger
than the receive buffer (uio_resid > hiwat).
MFC after: 2 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Process A pdfork(2)s process B.
2. Process A passes process descriptor of B to unrelated process C.
3. Hit CTRL+C to terminate process A. Process B is also terminated
with SIGINT.
4. init(8) collects status of process B.
5. Process C closes process descriptor associated with process B.
When we have such order of events, init(8), by collecting status of
process B, will call procdesc_reap(). This function sets pd_proc to NULL.
Now when process C calls close on this process descriptor,
procdesc_close() is called. Unfortunately procdesc_close() assumes that
pd_proc points at a valid proc structure, but it was set to NULL earlier,
so the kernel panics.
The patch also adds setting 'p->p_procdesc' to NULL in procdesc_reap(),
which I think should be done.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
handler and not more statically.
Unfortunately, it seems that this is not ideal for new platform bringup
and boot low level development (which needs ktr_cpumask to be effective
before tunables can be setup).
Because of this, add a way to statically initialize cpusets, by passing
an list of initializers, divided by commas. Also, provide a way to enforce
an all-set mask, for above mentioned initializers.
This imposes some differences on how KTR_CPUMASK is setup now as a
kernel option, and in particular this makes the words specifications
backward wrt. what is currently in -CURRENT. In order to avoid mismatches
between KTR_CPUMASK definition and other way to setup the mask
(tunable, sysctl) and to print it, change the ordering how
cpusetobj_print() and cpusetobj_scan() acquire the words belonging
to the set.
Please give a look to sys/conf/NOTES in order to understand how the
new format is supposed to work.
Also, ktr manpages will be updated shortly by gjb which volountereed
for this.
This patch won't be merged because it changes a POLA (at least
from the theoretical standpoint) and this is however a patch that
proves to be effective only in development environments.
Requested by: rpaulo
Reviewed by: jeff, rpaulo
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other CPUs doesn't require locking so get rid of it. As the latter is used
for the timecounter on certain machine models, using a spin lock in this
case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
demapping IPIs.
- Mark some unused function arguments as such.
MFC after: 1 week
|
|
|
|
|
| |
The SI_DEVOPEN, SI_CONSOPEN and SI_CANDELETE flags are not used by any
piece of code in the tree.
|
|
|
|
|
|
|
| |
it fits.
Reported by: lev
MFC after: 1 week
|
|
|
|
|
|
| |
tunables.
MFC after: 1 month
|
|
|
|
|
|
|
|
|
| |
for getvfsbyname(3) operation when called from 32bit process, and
getvfsbyname(3) is used by recent bsdtar import.
Reported by: many
Tested by: David Naylor <naylor.b.david@gmail.com>
MFC after: 5 days
|
|
|
|
| |
userland.
|
| |
|
|
|
|
| |
thread never blocks on a turnstile.
|
| |
|
|
|
|
|
|
|
| |
any bus-specific state (such as ivars) when a child device is deleted.
Requested by: kan
MFC after: 1 month
|
|
|
|
| |
MFC after: 1 week
|
| |
|
|
|
|
|
|
|
|
| |
for process, thread or others we want to support.
Use the syscall to implement POSIX API clock_getcpuclock and
pthread_getcpuclockid.
PR: 168417
|
|
|
|
|
|
| |
does not need Giant.
MFC after: 1 month
|
|
|
|
|
| |
Requested by: Peter Jeremy <peter@rulingia.com>
MFC after: 1 week
|
|
|
|
|
| |
Submitted by: jh
MFC after: 1 week
|
|
|
|
|
|
|
| |
longer).
PR: 156481
Submitted by: Ian Lepore
|
|
|
|
| |
Submitted by: bde
|
|
|
|
|
|
|
| |
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
| |
"device_free_softc()" and "device_claim_softc()",
to allow USB serial drivers refcounting the softc.
These functions are used to grab the softc from
auto-free and to free the softc back to the correct
malloc type, respectivly.
Discussed with: jhb
MFC after: 2 weeks
|
| |
|
|
|
|
|
| |
environment variables. KENV_MNAMELEN and KENV_MVALLEN doesn't include
space for the terminating NUL.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
8 or more cores to improve utilization. None of my tests on 2xXeon (2x6x2)
system shown any slowdown from mentioned "excess thrashing". Same time in
pbzip2 test with number of threads more then number of CPUs I see up to 10%
speedup with SMT disabled and up 5% with SMT enabled. Thinking about
trashing I was trying to limit that stealing within same last level cache,
but got only worse results. Present code any way prefers to steal threads
from topologically closer cores.
Sponsored by: iXsystems, Inc.
|
|
|
|
|
|
| |
to it, avoid this problem by detecting timeout earlier.
Reported by: pho
|
|
|
|
|
|
|
|
|
|
| |
- remove extra dynamic variable initializations;
- restore (4BSD) and implement (ULE) hogticks variable setting;
- make sched_rr_interval() more tolerant to options;
- restore (4BSD) and implement (ULE) kern.sched.quantum sysctl, a more
user-friendly wrapper for sched_slice;
- tune some sysctl descriptions;
- make some style fixes.
|
|
|
|
| |
always it was used as rate. Fix use side units to period in hz ticks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocated softc structure which is returned by
device_get_softc(). This method can be used to
easily implement softc refcounting. This can be
desirable when the softc has memory references
which are controlled by userspace handles for
example.
This solves the problem of blocking the caller
of device_detach() for a non-deterministic time.
Discussed with: kib, ed
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the wrong direction. Before it, if preemption and end of time slice happen
same time, thread was put to the head of the queue as for only preemption.
It could cause single thread to run for indefinitely long time. r220198
handles it by not clearing TDF_NEEDRESCHED in case of preemption. But that
causes delayed context switch every time preemption happens, even when not
needed.
Solve problem by introducing scheduler-specifoc thread flag TDF_SLICEEND,
set when thread's time slice is over and it should be put to the tail of
queue. Using SW_PREEMPT flag for that purpose as it was before just not
enough informative to work correctly.
On my tests this by 2-3 times reduces run time deviation (improves fairness)
in cases when several threads share one CPU.
Reviewed by: fabient
MFC after: 2 months
Sponsored by: iXsystems, Inc.
|
|
|
|
|
|
|
|
|
|
|
| |
With switchticks variable being reset each time thread preempted (that is
done regularly by interrupt threads) scheduling quantum may never expire.
It was not noticed in time because several other factors still regularly
trigger context switches.
Handle the problem by replacing that mechanism with its equivalent from
SCHED_ULE called time slice. It is effectively the same, just measured in
context of stathz instead of hz. Some unification is probably not bad.
|
|
|
|
|
| |
Submitted by: Andrey Zonov <andrey@zonov.org>
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
| |
In rare event when fast and ithread interrupts share the same vector
and the fast handler was registered first, we can end up trying to
schedule the ithread that is not created yet. The kernel built with
INVARIANTS then triggers an assertion.
Change the order to create the ithread first and only then add the
handler that needs it to the interrupt event handlers list.
Reviewed by: jhb
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.
Suggested and reviewed by: alc
MFC after: 2 weeks
|