FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	MFC r282944:	kib	2015-05-22	1	-1/+0
\| \| \| \| \| \|	Decrement p_boundary_count in the single-threading thread, during making other thread runnable. This guarantees that upon return from the thread_single_end(), p_boundary_count is zero.
*	MFC r277322:	kib	2015-01-25	1	-0/+2
\| \| \| \| \|	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process.
*	MFC r277055:	kib	2015-01-19	1	-1/+1
\| \| \| \|	Revert r263475: TDP_DEVMEMIO no longer needed.
*	Merge reaper facility.	kib	2015-01-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MFC r270443 (by mjg): Properly reparent traced processes when the tracer dies. MFC r273452 (by mjg): Plug unnecessary PRS_NEW check in kern_procctl. MFC 275800: Add a facility for non-init process to declare itself the reaper of the orphaned descendants. MFC r275821: Add missed break. MFC r275846 (by mckusick): Add some additional clarification and fix a few gammer nits. MFC r275847 (by bdrewery): Bump Dd for r275846.
*	MFC r275745:	kib	2014-12-27	1	-6/+11
\| \| \| \| \| \| \| \| \| \|	Add facility to stop all userspace processes. MFC r275753: Fix gcc build. MFC r275820: Add missed break.
*	MFC r275616:	kib	2014-12-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thread waiting for the vfork(2)-ed child to exec or exit, must allow for the suspension. MFC r275683 (by andreast): Fix build for powerpc(32\|64) kernels. MFC r275686 (by andreast): Fix kernel build for booke. r275639 (by andrew) is not merged, since arm/arm/syscall.c is not present on the stable/10 branch, and arm/arm/trap.c already includes sys/kernel.h.
*	MFC r271000:	kib	2014-09-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Delay the return from thread_single(SINGLE_EXIT) until all threads are really destroyed by thread_stash() after the last switch out. MFC r271007: Retire thread_unthread(). MFC r271008: Style. Approved by: re (marius)
*	MFC r269656:	kib	2014-08-21	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	Implement and use proc_realparent(9). MFC r270024 (by markj): Correct the order of arguments passed to LIST_INSERT_AFTER(). For merge, the p_treeflag member of struct proc was moved to the end of the structure, to keep KBI intact.
*	MFC r266464:	kib	2014-05-23	1	-0/+1
\| \| \| \| \|	In execve(2), postpone the free of old vmspace until the threads are resumed and exited.
*	MFC r263475:	kib	2014-03-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
*	MFC Alexander Motin's GEOM direct dispatch work:	scottl	2014-01-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used. r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more. r256607: Fix passing uninitialized bio_resid argument to g_trace(). r256610: Add unmapped I/O support to GEOM RAID. r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods). r256880: Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems. Submitted by: mav Sponsored by: iX Systems
*	Extend the support for exempting processes from being killed when swap is	jhb	2013-09-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exhausted. - Add a new protect(1) command that can be used to set or revoke protection from arbitrary processes. Similar to ktrace it can apply a change to all existing descendants of a process as well as future descendants. - Add a new procctl(2) system call that provides a generic interface for control operations on processes (as opposed to the debugger-specific operations provided by ptrace(2)). procctl(2) uses a combination of idtype_t and an id to identify the set of processes on which to operate similar to wait6(). - Add a PROC_SPROTECT control operation to manage the protection status of a set of processes. MADV_PROTECT still works for backwards compatability. - Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc) the first bit of which is used to track if P_PROTECT should be inherited by new child processes. Reviewed by: kib, jilles (earlier version) Approved by: re (delphij) MFC after: 1 month
*	Don't call sleepinit() from proc0_init(), make it a SYSINIT instead.	cognet	2013-08-09	1	-1/+0
\| \| \| \| \|	vmem needs the sleepq locks to be initialized when free'ing kva, so we want it called as early as possible.
*	o Add accessor functions to add and remove pages from a specific	attilio	2013-05-13	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle(). Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
*	Similarly to proc_getargv() and proc_getenvv(), export proc_getauxv()	trociny	2013-04-14	1	-0/+1
\| \| \| \| \| \|	to be able to reuse the code. MFC after: 3 weeks
*	Move CRITICAL_ASSERT() macro to systm.h, where the critical(9)	glebius	2013-04-06	1	-3/+0
\| \| \| \|	functions are declared.
*	Replace the TDP_NOSLEEPING flag with a counter so that the	jhb	2013-03-01	1	-11/+4
\| \| \| \| \| \|	THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio
*	MFcalloutng:	davide	2013-02-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
*	When vforked child is traced, the debugging events are not generated	kib	2013-02-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	until child performs exec(). The behaviour is reasonable when a debugger is the real parent, because the parent is stopped until exec(), and sending a debugging event to the debugger would deadlock both parent and child. On the other hand, when debugger is not the parent of the vforked child, not sending debugging signals makes it impossible to debug across vfork. Fix the issue by declining generating debug signals only when vfork() was done and child called ptrace(PT_TRACEME). Set a new process flag P_PPTRACE from the attach code for PT_TRACEME, if P_PPWAIT flag is set, which indicates that the process was created with vfork() and still did not execed. Check P_PPTRACE from issignal(), instead of refusing the trace outright for the P_PPWAIT case. The scope of P_PPTRACE is exactly contained in the scope of P_PPWAIT. Found and tested by: zont Reviewed by: pluknet MFC after: 2 weeks
*	In pget(9), if PGET_NOTWEXIT flag is not specified, also search the	kib	2012-11-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	zombie list for the pid. This allows several kern.proc sysctls to report useful information for zombies. Hold the allproc_lock around all searches instead of relocking it. Remove private pfind_locked() from the new nfs client code. Requested and reviewed by: pjd Tested by: pho MFC after: 3 weeks
*	Add the wait6(2) system call. It takes POSIX waitid()-like process	kib	2012-11-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	designator to select a process which is waited for. The system call optionally returns siginfo_t which would be otherwise provided to SIGCHLD handler, as well as extended structure accounting for child and cumulative grandchild resource usage. Allow to get the current rusage information for non-exited processes as well, similar to Solaris. The explicit WEXITED flag is required to wait for exited processes, allowing for more fine-grained control of the events the waiter is interested in. Fix the handling of siginfo for WNOWAIT option for all wait*(2) family, by not removing the queued signal state. PR: standards/170346 Submitted by: "Jukka A. Ukkonen" <jau@iki.fi> MFC after: 1 month
*	Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".	trasz	2012-10-26	1	-0/+1
\| \| \| \|	It was implemented by Rudolf Tomori during Google Summer of Code 2012.
*	Add a KPI to allow to reserve some amount of space in the numvnodes	kib	2012-10-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
*	Add a sysctl kern.pid_max, which limits the maximum pid the system is	kib	2012-08-15	1	-2/+3
\| \| \| \| \| \| \|	allowed to allocate, and corresponding tunable with the same name. Note that existing processes with higher pids are left intact. MFC after: 1 week
*	Remove stray blank line.	kib	2012-06-30	1	-1/+0
\| \| \| \|	MFC after: 3 days
*	vn_io_fault() is a facility to prevent page faults while filesystems	kib	2012-05-30	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
*	Add a rangelock implementation, intended to be used to range-locking	kib	2012-05-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month
*	Stop treating td_sigmask specially for the purposes of new thread	kib	2012-05-26	1	-4/+4
\| \| \| \| \| \| \| \| \|	creation. Move it into the copied region of the struct thread. Update some comments. Requested by: bde X-MFC after: never
*	Calculate the count of per-process cow faults. Export the count to	kib	2012-05-23	1	-0/+1
\| \| \| \| \| \| \|	userspace using the obscure spare int field in struct kinfo_proc. Submitted by: Andrey Zonov <andrey zonov org> MFC after: 1 week
*	Allow for the process information sysctls to accept a thread id in addition	kib	2012-04-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	to the process id. It follows the ptrace(2) interface and allows debugging libraries to use thread ids directly, without slow and verbose conversion of thread id into pid. The PGET_NOTID flag is provided to allow a specific sysctl to disallow this behaviour. All current callers of pget(9) have useful semantic to operate on tid and do not need this flag. Reviewed by: jhb, trocini MFC after: 1 week
*	Add thread-private flag to indicate that error value is already placed	kib	2012-04-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	in td_errno. Flag is supposed to be used by syscalls returning EJUSTRETURN because errno was already placed into the usermode frame by a call to set_syscall_retval(9). Both ktrace and dtrace get errno value from td_errno if the flag is set. Use the flag to fix sigsuspend(2) error return ktrace records. Requested by: bde MFC after: 1 week
*	Handle spurious page faults that may occur in no-fault sections of the	alc	2012-03-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week
*	Currently, the debugger attached to the process executing vfork() does	kib	2012-02-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	not get syscall exit notification until the child performed exec of exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT clearing, by postponing the wait into syscallret after ptracestop() notification is done. Reported, tested and reviewed by: Dmitry Mikulin <dmitrym juniper net> MFC after: 2 weeks
*	Allow the parent to gather the exit status of the children reparented	kib	2012-02-23	1	-2/+9
\| \| \| \| \| \| \| \| \| \|	to the debugger. When reparenting for debugging, keep the child in the new orphan list of old parent. When looping over the children in kern_wait(), iterate over both children list and orphan list to search for the process by pid. Submitted by: Dmitry Mikulin <dmitrym juniper.net> MFC after: 2 weeks
*	Mark the automatically attached child with PL_FLAG_CHILD in struct	kib	2012-02-10	1	-0/+1
\| \| \| \| \| \| \|	lwpinfo flags, for PT_FOLLOWFORK auto-attachment. In collaboration with: Dmitry Mikulin <dmitrym juniper net> MFC after: 1 week
*	Current implementations of sync(2) and syncer vnode fsync() VOP uses	kib	2012-02-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
*	Avoid to check the same cache line/variable from all the locking	attilio	2012-01-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	primitives by breaking stop_scheduler into a per-thread variable. Also, store the new td_stopsched very close to td_*locks members as they will be accessed mostly in the same codepaths as td_stopsched and this results in avoiding a further cache-line pollution, possibly. STOP_SCHEDULER() was pondered to use a new 'thread' argument, in order to take advantage of already cached curthread, but in the end there should not really be a performance benefit, while introducing a KPI breakage. In collabouration with: flo Reviewed by: avg MFC after: 3 months (or never) X-MFC: r228424
*	Abrogate nchr argument in proc_getargv() and proc_getenvv(): we always want	trociny	2012-01-15	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to read strings completely to know the actual size. As a side effect it fixes the issue with kern.proc.args and kern.proc.env sysctls, which didn't return the size of available data when calling sysctl(3) with the NULL argument for oldp. Note, in get_ps_strings(), which does actual work for proc_getargv() and proc_getenvv(), we still have a safety limit on the size of data read in case of a corrupted procces stack. Suggested by: kib MFC after: 3 days
*	On start most of sysctl_kern_proc functions use the same pattern:	trociny	2011-12-17	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	locate a process calling pfind() and do some additional checks like p_candebug(). To reduce this code duplication a new function pget() is introduced and used. As the function may be useful not only in kern_proc.c it is in the kernel name space. Suggested by: kib Reviewed by: kib MFC after: 2 weeks
*	Add new sysctls, KERN_PROC_ENV and KERN_PROC_AUXV, to return	trociny	2011-11-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	environment strings and ELF auxiliary vectors from a process stack. Make sysctl_kern_proc_args to read not cached arguments from the process stack. Export proc_getargv() and proc_getenvv() so they can be reused by procfs and linprocfs. Suggested by: kib Reviewed by: kib Discussed with: kib, rwatson, jilles Tested by: pho MFC after: 2 weeks
*	Consistently use process spin lock for protection of the	kib	2011-11-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	p->p_boundary_count. Race could cause the execve(2) from the threaded process to hung since thread boundary counter was incorrect and single-threading never finished. Reported by: pluknet, pho Tested by: pho MFC after: 1 week
*	Assert that _PRELE() is done for the held process.	kib	2011-11-09	1	-0/+1
\| \| \| \| \|	Tested by: pho MFC after: 1 week
*	Inline the syscallenter() and syscallret(). This reduces the time measured	kib	2011-09-11	1	-3/+0
\| \| \| \| \| \| \| \|	by the syscall entry speed microbenchmarks by ~10% on amd64. Submitted by: jhb Approved by: re (bz) MFC after: 2 weeks
*	Add experimental support for process descriptors	jonathan	2011-08-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A "process descriptor" file descriptor is used to manage processes without using the PID namespace. This is required for Capsicum's Capability Mode, where the PID namespace is unavailable. New system calls pdfork(2) and pdkill(2) offer the functional equivalents of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote process for debugging purposes. The currently-unimplemented pdwait(2) will, in the future, allow querying rusage/exit status. In the interim, poll(2) may be used to check (and wait for) process termination. When a process is referenced by a process descriptor, it does not issue SIGCHLD to the parent, making it suitable for use in libraries---a common scenario when using library compartmentalisation from within large applications (such as web browsers). Some observers may note a similarity to Mach task ports; process descriptors provide a subset of this behaviour, but in a UNIX style. This feature is enabled by "options PROCDESC", but as with several other Capsicum kernel features, is not enabled by default in GENERIC 9.0. Reviewed by: jhb, kib Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
*	Add a facility to disable processing page faults. When activated,	kib	2011-07-09	1	-1/+1
\| \| \| \| \| \| \| \|	uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
*	Use 'curthread_pflags' instead of 'thread_pflags' to signify that only	kib	2011-07-09	1	-2/+2
\| \| \| \| \| \| \|	curthread can be operated upon. Requested by: attilio MFC after: 1 week
*	Implement a helper functions to locally set thread-private flag, and	kib	2011-07-09	1	-0/+19
\| \| \| \| \| \| \| \| \|	restore it to the previous state. Note that only setting a flag locally is supported. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version) MFC after: 1 week
*	We should not return ECHILD when debugging a child and the parent does a	obrien	2011-06-14	1	-0/+2
\| \| \| \| \| \| \|	"wait4(-1, ..., WNOHANG, ...)". Instead wait(2) should behave as if the child does not wish to report status at this time. Reviewed by: jhb
*	Style fixes:	jhb	2011-05-19	1	-14/+14
\| \| \| \| \|	- Sort forward declarations of structures. - Prefer uint64_t to u_int64_t.
*	Remove stale M_ZOMBIE malloc type.	pluknet	2011-04-14	1	-1/+0
\| \| \| \| \| \|	This type is unused since embedding p_ru into struct proc. MFC after: 1 week