| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Decrement p_boundary_count in the single-threading thread, during making
other thread runnable. This guarantees that upon return from the
thread_single_end(), p_boundary_count is zero.
|
|
|
|
|
| |
Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process.
|
|
|
|
| |
Revert r263475: TDP_DEVMEMIO no longer needed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFC r270443 (by mjg):
Properly reparent traced processes when the tracer dies.
MFC r273452 (by mjg):
Plug unnecessary PRS_NEW check in kern_procctl.
MFC 275800:
Add a facility for non-init process to declare itself the reaper of
the orphaned descendants.
MFC r275821:
Add missed break.
MFC r275846 (by mckusick):
Add some additional clarification and fix a few gammer nits.
MFC r275847 (by bdrewery):
Bump Dd for r275846.
|
|
|
|
|
|
|
|
|
|
| |
Add facility to stop all userspace processes.
MFC r275753:
Fix gcc build.
MFC r275820:
Add missed break.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thread waiting for the vfork(2)-ed child to exec or exit, must allow
for the suspension.
MFC r275683 (by andreast):
Fix build for powerpc(32|64) kernels.
MFC r275686 (by andreast):
Fix kernel build for booke.
r275639 (by andrew) is not merged, since arm/arm/syscall.c is not
present on the stable/10 branch, and arm/arm/trap.c already includes
sys/kernel.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Delay the return from thread_single(SINGLE_EXIT) until all threads are
really destroyed by thread_stash() after the last switch out.
MFC r271007:
Retire thread_unthread().
MFC r271008:
Style.
Approved by: re (marius)
|
|
|
|
|
|
|
|
|
|
| |
Implement and use proc_realparent(9).
MFC r270024 (by markj):
Correct the order of arguments passed to LIST_INSERT_AFTER().
For merge, the p_treeflag member of struct proc was moved to the end
of the structure, to keep KBI intact.
|
|
|
|
|
| |
In execve(2), postpone the free of old vmspace until the threads are resumed
and exited.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix two issues with /dev/mem access on amd64, both causing kernel page
faults.
First, for accesses to direct map region should check for the limit by
which direct map is instantiated.
Second, for accesses to the kernel map, use a new thread private flag
TDP_DEVMEMIO, which instructs vm_fault() to return error when fault
happens on the MAP_ENTRY_NOFAULT entry, instead of panicing.
MFC r263498:
Add change forgotten in r263475. Make dmaplimit accessible outside
amd64/pmap.c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r256603:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
r256606:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.
r256607:
Fix passing uninitialized bio_resid argument to g_trace().
r256610:
Add unmapped I/O support to GEOM RAID.
r256830:
Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping
temporary mapped buffer. That fixes double unmap if biodone() called twice
for the same BIO (but with different done methods).
r256880:
Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
r259247:
Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit
goes to Alexander and iX Systems.
Submitted by: mav
Sponsored by: iX Systems
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.
Reviewed by: kib, jilles (earlier version)
Approved by: re (delphij)
MFC after: 1 month
|
|
|
|
|
| |
vmem needs the sleepq locks to be initialized when free'ing kva, so we want it
called as early as possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
freelist.
o Split the pool of free pages queues really by domain and not rely on
definition of VM_RAW_NFREELIST.
o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific
function that is called when calculating the allocation domain.
The RR counter is kept, currently, per-thread.
In the future it is expected that such function evolves in a real
policy decision referee, based on specific informations retrieved by
per-thread and per-vm_object attributes.
o Add the concept of "probed domains" under the form of vm_ndomains.
It is responsibility for every architecture willing to support multiple
memory domains to correctly probe vm_ndomains along with mem_affinity
segments attributes. Those two values are supposed to remain always
consistent.
Please also note that vm_ndomains and td_dom_rr_idx are both int
because segments already store domains as int. Ideally u_int would
have much more sense. Probabilly this should be cleaned up in the
future.
o Apply RR domain selection also to vm_phys_zero_pages_idle().
Sponsored by: EMC / Isilon storage division
Partly obtained from: jeff
Reviewed by: alc
Tested by: jeff
|
|
|
|
|
|
| |
to be able to reuse the code.
MFC after: 3 weeks
|
|
|
|
| |
functions are declared.
|
|
|
|
|
|
| |
THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest.
Reviewed by: attilio
|
|
|
|
|
|
|
|
|
|
|
| |
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.
The commmit is not targeted for MFC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
until child performs exec(). The behaviour is reasonable when a
debugger is the real parent, because the parent is stopped until
exec(), and sending a debugging event to the debugger would deadlock
both parent and child.
On the other hand, when debugger is not the parent of the vforked
child, not sending debugging signals makes it impossible to debug
across vfork.
Fix the issue by declining generating debug signals only when vfork()
was done and child called ptrace(PT_TRACEME). Set a new process flag
P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is
set, which indicates that the process was created with vfork() and
still did not execed. Check P_PPTRACE from issignal(), instead of
refusing the trace outright for the P_PPWAIT case. The scope of
P_PPTRACE is exactly contained in the scope of P_PPWAIT.
Found and tested by: zont
Reviewed by: pluknet
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
| |
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.
Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.
Requested and reviewed by: pjd
Tested by: pho
MFC after: 3 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
designator to select a process which is waited for. The system call
optionally returns siginfo_t which would be otherwise provided to
SIGCHLD handler, as well as extended structure accounting for child
and cumulative grandchild resource usage.
Allow to get the current rusage information for non-exited processes
as well, similar to Solaris.
The explicit WEXITED flag is required to wait for exited processes,
allowing for more fine-grained control of the events the waiter is
interested in.
Fix the handling of siginfo for WNOWAIT option for all wait*(2)
family, by not removing the queued signal state.
PR: standards/170346
Submitted by: "Jukka A. Ukkonen" <jau@iki.fi>
MFC after: 1 month
|
|
|
|
| |
It was implemented by Rudolf Tomori during Google Summer of Code 2012.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
counter, without actually allocating the vnodes. The supposed use of
the getnewvnode_reserve(9) is to reclaim enough free vnodes while the
code still does not hold any resources that might be needed during the
reclamation, and to consume the slack later for getnewvnode() calls
made from the innards. After the critical block is finished, the
caller shall free any reserve left, by getnewvnode_drop_reserve(9).
Reviewed by: avg
Tested by: pho
MFC after: 1 week
|
|
|
|
|
|
|
| |
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.
MFC after: 1 week
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
perform copyin/copyout of the file data into the usermode
buffer. Typical filesystem hold vnode lock and some buffer locks over
the VOP_READ() and VOP_WRITE() operations, and since page fault
handler may need to recurse into VFS to get the page content, a
deadlock is possible.
The facility works by disabling page faults handling for the current
thread and attempting to execute i/o while allowing uiomove() to
access the usermode mapping of the i/o buffer. If all buffer pages are
resident, uiomove() is successfull and request is finished. If EFAULT
is returned from uiomove(), the pages backing i/o buffer are faulted
in and held, and the copyin/out is performed using uiomove_fromphys()
over the held pages for the second attempt of VOP call.
Since pages are hold in chunks to prevent large i/o requests from
starving free pages pool, and since vnode lock is only taken for
i/o over the current chunk, the vnode lock no longer protect atomicity
of the whole i/o request. Use newly added rangelocks to provide the
required atomicity of i/o regardind other i/o and truncations.
Filesystems need to explicitely opt-in into the scheme, by setting the
MNTK_NO_IOPF struct mount flag, and optionally by using
vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or
converting uio into request for uiomove_fromphys().
Reviewed by: bf (comments), mdf, pjd (previous version)
Tested by: pho
Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version)
MFC after: 2 months
|
|
|
|
|
|
|
|
|
| |
the i/o regions of the vnode data space. The implementation is quite
simple-minded, it uses the list of the lock requests, ordered by
arrival time. Each request may be for read or for write. The
implementation is fair FIFO.
MFC after: 2 month
|
|
|
|
|
|
|
|
|
| |
creation. Move it into the copied region of the struct thread.
Update some comments.
Requested by: bde
X-MFC after: never
|
|
|
|
|
|
|
| |
userspace using the obscure spare int field in struct kinfo_proc.
Submitted by: Andrey Zonov <andrey zonov org>
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to the process id. It follows the ptrace(2) interface and allows debugging
libraries to use thread ids directly, without slow and verbose conversion
of thread id into pid.
The PGET_NOTID flag is provided to allow a specific sysctl to disallow
this behaviour. All current callers of pget(9) have useful semantic to
operate on tid and do not need this flag.
Reviewed by: jhb, trocini
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
| |
in td_errno. Flag is supposed to be used by syscalls returning
EJUSTRETURN because errno was already placed into the usermode frame
by a call to set_syscall_retval(9). Both ktrace and dtrace get errno
value from td_errno if the flag is set.
Use the flag to fix sigsuspend(2) error return ktrace records.
Requested by: bde
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kernel.
When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.
In collaboration with: kib
Tested by: gibbs (an earlier version)
I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
not get syscall exit notification until the child performed exec of
exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.
Reported, tested and reviewed by: Dmitry Mikulin <dmitrym juniper net>
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
to the debugger. When reparenting for debugging, keep the child in
the new orphan list of old parent. When looping over the children in
kern_wait(), iterate over both children list and orphan list to search
for the process by pid.
Submitted by: Dmitry Mikulin <dmitrym juniper.net>
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
lwpinfo flags, for PT_FOLLOWFORK auto-attachment.
In collaboration with: Dmitry Mikulin <dmitrym juniper net>
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.
Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.
Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.
Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
primitives by breaking stop_scheduler into a per-thread variable.
Also, store the new td_stopsched very close to td_*locks members as
they will be accessed mostly in the same codepaths as td_stopsched and
this results in avoiding a further cache-line pollution, possibly.
STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to
take advantage of already cached curthread, but in the end there should
not really be a performance benefit, while introducing a KPI breakage.
In collabouration with: flo
Reviewed by: avg
MFC after: 3 months (or never)
X-MFC: r228424
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to read strings completely to know the actual size.
As a side effect it fixes the issue with kern.proc.args and kern.proc.env
sysctls, which didn't return the size of available data when calling
sysctl(3) with the NULL argument for oldp.
Note, in get_ps_strings(), which does actual work for proc_getargv() and
proc_getenvv(), we still have a safety limit on the size of data read in
case of a corrupted procces stack.
Suggested by: kib
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
locate a process calling pfind() and do some additional checks like
p_candebug(). To reduce this code duplication a new function pget() is
introduced and used.
As the function may be useful not only in kern_proc.c it is in the
kernel name space.
Suggested by: kib
Reviewed by: kib
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
environment strings and ELF auxiliary vectors from a process stack.
Make sysctl_kern_proc_args to read not cached arguments from the
process stack.
Export proc_getargv() and proc_getenvv() so they can be reused by
procfs and linprocfs.
Suggested by: kib
Reviewed by: kib
Discussed with: kib, rwatson, jilles
Tested by: pho
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
| |
p->p_boundary_count. Race could cause the execve(2) from the threaded
process to hung since thread boundary counter was incorrect and
single-threading never finished.
Reported by: pluknet, pho
Tested by: pho
MFC after: 1 week
|
|
|
|
|
| |
Tested by: pho
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
by the syscall entry speed microbenchmarks by ~10% on amd64.
Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.
New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.
When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.
This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.
Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
|
|
|
|
|
|
|
|
| |
uiomove generates EFAULT if any accessed address is not mapped, as
opposed to handling the fault.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc (previous version)
|
|
|
|
|
|
|
| |
curthread can be operated upon.
Requested by: attilio
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
restore it to the previous state. Note that only setting a flag locally
is supported.
Sponsored by: The FreeBSD Foundation
Reviewed by: alc (previous version)
MFC after: 1 week
|
|
|
|
|
|
|
| |
"wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the
child does not wish to report status at this time.
Reviewed by: jhb
|
|
|
|
|
| |
- Sort forward declarations of structures.
- Prefer uint64_t to u_int64_t.
|
|
|
|
|
|
| |
This type is unused since embedding p_ru into struct proc.
MFC after: 1 week
|