summaryrefslogtreecommitdiffstats
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Explicitly wire the user buffer rather than doing it implicitly inmdf2011-01-275-4/+18
| | | | | | | | sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT drain for extremely large amounts of data where the caller knows that appropriate references are held, and sleeping is not an issue. Inspired by: rwatson
* Remove the CTLFLAG_NOLOCK as it seems to be both unused andmdf2011-01-261-5/+3
| | | | | | | | | | | | unfunctional. Wiring the user buffer has only been done explicitly since r101422. Mark the kern.disks sysctl as MPSAFE since it is and it seems to have been mis-using the NOLOCK flag. Partially break the KPI (but not the KBI) for the sysctl_req 'lock' field since this member should be private and the "REQ_LOCKED" state seems meaningless now.
* Add macro to test the sv_flags of any process. Change some places to testdchagin2011-01-262-3/+3
| | | | | | | the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month
* When vtruncbuf() iterates over the vnode buffer list, lock buffer objectkib2011-01-251-2/+5
| | | | | | | | | before checking the validity of the next buffer pointer. Otherwise, the buffer might be reclaimed after the check, causing iteration to run into wrong buffer. Reported and tested by: pho MFC after: 1 week
* Allow debugger to specify that children of the traced process should bekib2011-01-255-13/+85
| | | | | | | | automatically traced. Extend the ptrace(PL_LWPINFO) to report that child just forked. Reviewed by: davidxu, jhb MFC after: 2 weeks
* Replace spaces with tabs.jh2011-01-241-2/+2
|
* Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.pluknet2011-01-211-0/+7
| | | | | | | Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
* Introduce signed and unsigned version of CTLTYPE_QUAD, renamingmdf2011-01-192-5/+8
| | | | existing uses. Rename sysctl_handle_quad() to sysctl_handle_64().
* Specify a CTLTYPE_FOO so that a future sysctl(8) change does not needmdf2011-01-188-22/+26
| | | | to rely on the format string.
* Rework realtime priority support:jhb2011-01-143-6/+15
| | | | | | | | | | | | | | | | | | | - Move the realtime priority range up above kernel sleep priorities and just below interrupt thread priorities. - Contract the interrupt and kernel sleep priority ranges a bit so that the timesharing priority band can be increased. The new timeshare range is now slightly larger than the old realtime + timeshare ranges. - Change the ULE scheduler to no longer use realtime priorities for interactive threads. Instead, the larger timeshare range is now split into separate subranges for interactive and non-interactive ("batch") threads. The end result is that interactive threads and non-interactive threads still use the same priority ranges as before, but realtime threads now have a separate, dedicated priority range. - Do not modify the priority of non-timeshare threads in sched_sleep() or via cv_broadcastpri(). Realtime and idle priority threads will no longer have their priorities affected by sleeping in the kernel. Reviewed by: jeff
* One more sysctl(9) type-safety that I missed before.mdf2011-01-131-1/+1
|
* Fix up a few more sysctl(9) mis-typing found in various LINT builds.mdf2011-01-131-1/+2
|
* Introduce two new helper macros to define the priority ranges used forjhb2011-01-131-16/+25
| | | | | | | | interactive timeshare threads (PRI_*_INTERACTIVE) and non-interactive timeshare threads (PRI_*_BATCH) and use these instead of PRI_*_REALTIME and PRI_*_TIMESHARE. No functional change. Reviewed by: jeff
* sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.mdf2011-01-127-12/+12
| | | | Commit the kernel changes.
* - Retire some unused ithread priorities: PI_TTYHIGH, PI_TAPE, andjhb2011-01-111-14/+5
| | | | | | PI_DISKLOW. While here, rename PI_TTYLOW to PI_TTY. - Add a macro PI_SWI() that takes a SWI_* constant as an argument and returns the suitable thread priority.
* Always use PRI_BASE() when checking the base type of a thread's priorityjhb2011-01-111-2/+2
| | | | | | class. MFC after: 2 weeks
* Remove unneeded includes of <sys/linker_set.h>. Other headers that usejhb2011-01-114-4/+0
| | | | | | it internally contain nested includes. Reviewed by: bde
* Fix hhook_head_is_virtualised() so that "ret" can't be used uninitialised.lstewart2011-01-111-4/+5
| | | | | | | Sponsored by: FreeBSD Foundation Submitted by: pjd MFC after: 9 weeks X-MFC with: r216615
* Fix some minor style/readability nits in hhook.lstewart2011-01-111-6/+3
| | | | | | | Sponsored by: FreeBSD Foundation Submitted by: pjd MFC after: 9 weeks X-MFC with: r216615
* Fix two harmless off-by-one errors.jhb2011-01-101-3/+3
| | | | | Reviewed by: jeff MFC after: 2 weeks
* Improve style and wording of comments and sysctl descriptions [1].bz2011-01-091-12/+11
| | | | | | | | Move machdep.ct_debug to debug.clocktime as there was no reason to actually put it under machdep in r216340. Submitted by: bde [1] MFC after: 3 days
* Make RB_CDROM work. This should probably check for a disc in cd1 and acd1nwhitehorn2011-01-081-2/+2
| | | | as well.
* Revert r216805.attilio2011-01-081-119/+23
| | | | | | | | | | That revision is introducing a bug which is more visible than problems it is trying to fix. As long as my time is very limited in this period I am going to commit back this patch just once it is fully fixed. Reported by: dim, Nicholas Esborn
* Use the same expression to report stack protection mode for AT_STACKEXECkib2011-01-081-2/+3
| | | | as the expression used by exec_new_vmspace().
* In elf image activator, read and apply the stack protection mode fromkib2011-01-081-5/+17
| | | | | | | | | | | | | PT_GNU_STACK program header, if present and enabled. Two new sysctls are provided, kern.elf32.nxstack and kern.elf64.nxstack, that allow to enable PT_GNU_STACK for ABIs of specified bitsize, if ABI decided to support shared page. Inform rtld about access mode of the stack initial mapping by AT_STACKPROT aux vector. At the moment, the default is disabled, waiting for the usermode support bits.
* Create shared (readonly) page. Each ABI may specify the use of page bykib2011-01-081-4/+85
| | | | | | | | | | | | | setting SV_SHP flag and providing pointer to the vm object and mapping address. Provide simple allocator to carve space in the page, tailored to put the code with alignment restrictions. Enable shared page use for amd64, both native and 32bit FreeBSD binaries. Page is private mapped at the top of the user address space, moving a start of the stack one page down. Move signal trampoline code from the top of the stack to the shared page. Reviewed by: alc
* Collect code to translate between vm_prot_t and p_flags into helperkib2011-01-081-22/+35
| | | | | | functions. MFC after: 1 week
* - Properly initialize the base priority (td_base_pri) of thread0 to PVMjhb2011-01-062-5/+6
| | | | | | | | | | to match the desired priority in td_priority. Otherwise the first time thread0 used a borrowed priority it would drop down to PUSER instead of PVM. - Explicitly initialize the starting priority of new kprocs to PVM to avoid inheriting some random priority from thread0. MFC after: 2 weeks
* - Move sched_fork() later in fork() after the various sections of the newjhb2011-01-063-9/+13
| | | | | | | | | | | | | | thread and proc have been copied and zeroed from the old thread and proc. Otherwise attempts to modify thread or process data in sched_fork() could be undone. - Don't copy td_{base,}_user_pri from the old thread to the new thread in sched_fork_thread() in ULE. This is already done courtesy the bcopy() of the thread copy region. - Always initialize the real priority (td_priority) of new threads to the new thread's base priority (td_base_pri) to avoid bogusly inheriting a borrowed priority from the parent thread. MFC after: 2 weeks
* Only change the priority of timeshare threads to PRI_MAX_TIMESHAREjhb2011-01-061-1/+2
| | | | | | | when yield() is called. Specifically, leave the priority of real time and idle threads unchanged. MFC after: 2 weeks
* - Restore dropping the priority of syncer down to PPAUSE when it is idle.jhb2011-01-061-0/+7
| | | | | | | | | This was lost when it was converted to using a condition variable instead of lbolt. - Drop the priority of flowtable down to PPAUSE when it is idle as well since it is a similar background task. MFC after: 2 weeks
* Retire PCONFIG and leave the priority of thread0 alone when waiting forjhb2011-01-061-1/+1
| | | | interrupt config hooks to execute.
* Fix page fault that occurred when trying to initialize preloaded kernel module,trasz2011-01-051-3/+11
| | | | | | | | | | | | | | the dependency of which was preloaded, but failed to initialize. Previously, kernel dereferenced NULL pointer returned by modlist_lookup2(); now, when this happens, we unload the dependent module. Since the depended_files list is sorted in dependency order, this properly propagates, unloading modules that depend on failed ones. From the user point of view, this prevents the kernel from panicing when trying to boot kernel compiled without KDTRACE_HOOKS with dtraceall_load="YES" in /boot/loader.conf. Reviewed by: kib
* kproc_exit() is already marked __dead2 so a NOTREACHED comment here isn'tjhb2011-01-041-1/+0
| | | | | | needed for lint. Submitted by: bde
* Finish r210923, 210926. Mark some devices as eternal.kib2011-01-046-10/+15
| | | | MFC after: 2 weeks
* Small whitespace nits and add a comment explaining why kthread_exit() canjhb2011-01-031-3/+6
| | | | call kproc_exit() that was lost earlier.
* Finishing touches to fork1() - ANSIfy missed function definition, style(9)trasz2011-01-021-27/+20
| | | | | fixes, removal of few comments that didn't really make sense and addition of fork_findpid() locking requirements.
* Mfp4 CH177924:bz2010-12-311-1/+8
| | | | | | | | | | | | Add and export constants of array sizes of jail parameters as compiled into the kernel. This is the least intrusive way to allow kvm to read the (sparse) arrays independent of the options the kernel was compiled with. Reviewed by: jhb (originally) MFC after: 1 week Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH
* Remove OBJ_CLEANING flag. The vfs_setdirty_locked_object() is the onlykib2010-12-291-1/+1
| | | | | | | | | | | | | consumer of the flag, and it used the flag because OBJ_MIGHTBEDIRTY was cleared early in vm_object_page_clean, before the cleaning pass was done. This is no longer true after r216799. Moreover, since OBJ_CLEANING is a flag, and not the counter, it could be reset too prematurely when parallel vm_object_page_clean() are performed. Reviewed by: alc (as a part of the bigger patch) MFC after: 1 month (after r216799 is merged)
* Fix several callout migration races:attilio2010-12-291-23/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Problem1: Hypothesis: thread1 is doing a callout_reset_on(), within his callout handler, willing to implicitly or explicitly migrate the callout. thread2 is draining the callout. Thesys: * thread1 calls callout_lock() and locks the old callout cpu * thread1 performs the checks in the first path of the callout_reset_on() * thread1 hits this codepiece: /* * If the lock must migrate we have to check the state again as * we can't hold both the new and old locks simultaneously. */ if (c->c_cpu != cpu) { c->c_cpu = cpu; CC_UNLOCK(cc); goto retry; } which means it will drop the lock and 'retry' * thread2 will callout_lock() and locks the new callout cpu. thread1 spins on the new lock and will not keep going for the moment. * thread2 checks that the callout is not pending (as callout is currently running) and that it is not on cc->cc_curr (because cc now refers to the new callout and the callout is running on the old callout cpu) thus it thinks it is done and returns. * thread1 will now acquire the lock and then adds the callout to the new callout cpu queue That seems an obvious race as callout_stop() falsely reports the callout stopped or worse, callout_drain() falsely returns while the callout is still in use. - Solution1: Fixing this problem would require, in general, to lock both callout cpus at once while switching the c_cpu field and avoid cyclic deadlocks between callout cpus locks. The concept of CPUBLOCK is then introduced (working more or less like the blocked_lock for thread_lock() function) meaning: "in callout_lock(), spin until the c->c_cpu is not different from CPUBLOCK". That way the "original" callout cpu, referred to the above mentioned code snippet, will remain blocked until the lock handover is over critical path will remain covered. - Problem2: Having the callout currently executed on a specific callout cpu and contemporary pending on another callout cpu (as it can happen with current code) breaks, at least, the assumption callout_drain() returns just once the callout cannot be referenced anymore. - Solution2: Callout migration is deferred if the current callout is already under execution. The best place to do that is in softclock() and new members are added to the callout cpu structure in order to specify a pending migration is requested. That is necessary because the callout cannot be trusted (not freed) the 100% of times after the execution of the callout handler. CPUBLOCK will prevent, in the "deferred migration" case, that the callout gets freed in this case, stopping any callout_stop() and callout_drain() possible activity until the migration is actually performed. - Problem3: There is a further race in callout_drain(). In order to avoid a race between sleepqueue lock and callout cpu spinlock, in _callout_stop_safe(), the callout cpu lock is dropped, the sleepqueue lock is acquired and a new callout cpu lookup is performed. Note that the channel used for locking the sleepqueue is obtained from the "current" callout cpu (&cc->cc_waiting). If the callout migrated in the meanwhile, callout_drain() will end up using the wrong wchan for the sleepqueue (the locked one will be the older, while the new one will not really be locked) leading to a lock leak and a race access to sleepqueue. - Solution3: It is enough to check if a migration happened between the operation of acquiring the sleepqueue lock and the new callout cpu lock and eventually unwind all those and try again. This problems can lead to deathly races on moderate (4-ways) SMP environment, leading to easy panic or deadlocks. The 24-ways of the reporter, could easilly panic, with completely normal workload, almost daily. gianni@ kindly wrote the following prof-of-concept which can panic a FreeBSD machine in less than one hour, in smaller SMP: http://www.freebsd.org/~attilio/callout/test.c Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet In collabouration with: gianni, pho, Nicholas Esborn Reviewed by: jhb MFC after: 1 week (*) * Usually, I would aim for a larger MFC timeout, but I really want this in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special case for this patch
* - Follow r216313, the sched_unlend_user_prio is no longer needed, alwaysdavidxu2010-12-294-77/+30
| | | | | | | use sched_lend_user_prio to set lent priority. - Improve pthread priority-inherit mutex, when a contender's priority is lowered, repropagete priorities, this may cause mutex owner's priority to be lowerd, in old code, mutex owner's priority is rise-only.
* Teach ddb "show mount" about MNTK_SUJ flag.kib2010-12-271-0/+1
|
* Correct the order of the arguments to vm_fault_quick_hold_pages().alc2010-12-261-1/+1
|
* Introduce and use a new VM interface for temporarily pinning pages. Thisalc2010-12-253-62/+14
| | | | | | | new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
* Enlarge hash table for new condition variable.davidxu2010-12-231-2/+2
|
* MFp4:davidxu2010-12-221-15/+105
| | | | | | | | | | | | | | | - Add flags CVWAIT_ABSTIME and CVWAIT_CLOCKID for umtx kernel based condition variable, this should eliminate an extra system call to get current time. - Add sub-function UMTX_OP_NWAKE_PRIVATE to wake up N channels in single system call. Create userland sleep queue for condition variable, in most cases, thread will wait in the queue, the pthread_cond_signal will defer thread wakeup until the mutex is unlocked, it tries to avoid an extra system call and a extra context switch in time window of pthread_cond_signal and pthread_mutex_unlock. The changes are part of process-shared mutex project.
* Initialize fp_location for explicitly managed fail points, and pushmdf2010-12-211-2/+3
| | | | | | | | | | | | | | | the parentheses around the location for simple fail points into the location string. This makes the print on fail point set more consistent between the two versions. Also fix up fail.h a little for style(9): only use one of sys/param.h and sys/types.h, and use the existing __XSTRING() macro instead of rolling our own. Also fix up a few tabs on changed and nearby lines. Lastly, since KFAIL_POINT_{BEGIN,END} are not meant for use outside this file, just eliminate the macros entirely. MFC after: 1 week
* Move the fail_point_entry definition from fail.h to kern_fail.c, whichmdf2010-12-211-9/+37
| | | | | | | allows putting the enumeration constants of fail point types with the text string that matches them. MFC after: 1 week
* - Introduce the Hhook (Helper Hook) KPI. The KPI is closely modelled on pfil(9),lstewart2010-12-212-0/+928
| | | | | | | | | | | | | | | | | | | | | | | | | | | | and in many respects can be thought of as a more generic superset of pfil. Hhook provides a way for kernel subsystems to export hook points that Khelp modules can hook to provide enhanced or new functionality to the kernel. The KPI has been designed to ensure hook points pose no noticeable overhead when no hook functions are registered. - Introduce the Khelp (Kernel Helpers) KPI. Khelp provides a framework for managing Khelp modules, which indirectly use the Hhook KPI to register their hook functions with hook points of interest within the kernel. Khelp modules aim to provide a structured way to dynamically extend the kernel at runtime in an ABI preserving manner. Depending on the subsystem providing hook points, a Khelp module may be able to associate per-object data for maintaining relevant state between hook calls. - pjd's Object Specific Data (OSD) KPI is used to manage the per-object data allocated to Khelp modules. Create a new "OSD_KHELP" OSD type for use by the Khelp framework. - Bump __FreeBSD_version to 900028 to mark the introduction of the new KPIs. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz, others along the way MFC after: 3 months
* Introduce vm_fault_hold() and use it to (1) eliminate a long-standing racealc2010-12-201-63/+17
| | | | | | | | | | condition in proc_rwmem() and to (2) simplify the implementation of the cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem() the requested read or write could fail because the targeted page could be reclaimed between the calls to vm_fault() and vm_page_hold(). In collaboration with: kib@ MFC after: 6 weeks
OpenPOWER on IntegriCloud