summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_timeout.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r301522 (by bz)hiren2016-09-161-0/+37
| | | | | | | | | | | | | Implement a `show panic` command to DDB which will helpfully print the panic string again if set, in case it scrolled out of the active window. This avoids having to remember the symbol name. Also add a show callout <addr> command to DDB in order to inspect some struct callout fields in case of panics in the callout code. This may help to see if there was memory corruption or to further ease debugging problems. No objection by: bz
* MFC r303425:kib2016-08-271-42/+53
| | | | | | | Add callout_when(9). MFC r303919: Fix indentation.
* MFC r264388 (by davide):kib2016-08-271-5/+5
| | | | | | | | | | | | | Define SBT_MAX. MFC r267896 (by davide): Improve r264388. MFC note. The SBT_MAX definition already existed on stable/10, but without the refinement from r267896. Also, consumers of SBT_MAX were not converted, since r264388 was not merged properly. Reviewed by: mav
* MFC r292384:bdrewery2016-06-271-6/+4
| | | | | | Fix style issues around existing SDT probes. ** Changes to sys/netinet/in_kdtrace.c and sys/netinet/in_kdtrace.h skipped.
* MFC r298819:bdrewery2016-06-271-1/+1
| | | | sys/kern: spelling fixes in comments.
* MFC r296320:kib2016-03-151-5/+5
| | | | | | | Adjust _callout_stop_safe() return value for the subr_sleepqueue.c needs when migrating callout was blocked, but running one was not. PR: 200992
* MFC r288336: save some bytes by using more concise SDT_PROBE<n>avg2015-10-231-2/+2
|
* MFC r287354: callout_reset: fix a reversed check for cc_exec_cancelavg2015-09-111-1/+1
| | | | Relnotes: potential erratum
* MFC r280786:bz2015-04-241-2/+2
| | | | | | | | Try to unbreak !SMP kernels broken in r280785 (head), r281657 by using the proper macros to access cc_cpu. Requested by: jmallett Pointyhat to: rrs
* MFC of r280785, r280871, r280872, r281510, r218511 - callout fixes.rrs2015-04-171-47/+90
| | | | Sponsored by: Netflix Inc.
* MFC of r278469, r278623rrs2015-02-151-86/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 278469: This fixes two conditions that can incur when migration is being done in the callout code and harmonizes the macro use.: 1) The callout_active() will lie. Basically if a migration is occuring and the callout is about to expire and the migration has been deferred, the callout_active will no longer return true until after the migration. This confuses and breaks callers that are doing callout_init(&c, 1); such as TCP. 2) The migration code had a bug in it where when migrating, if a two calls to callout_reset came in and they both collided with the callout on the wheel about to run, then the second call to callout_reset would corrupt the list the callout wheel uses putting the callout thread into a endless loop. 3) Per imp, I have fixed all the macro occurance in the code that were for the most part being ignored. 278623: This fixes a bug I in-advertantly inserted when I updated the callout code in my last commit. The cc_exec_next is used to track the next when a direct call is being made from callout. It is *never* used in the in-direct method. When macro-izing I made it so that it would separate out direct/vs/non-direct. This is incorrect and can cause panics as Peter Holm has found for me (Thanks so much Peter for all your help in this). What this change does is restore that behavior but also get rid of the cc_next from the array and instead make it be part of the base callout structure. This way no one else will get confused since we will never use it for non-direct. Sponsored by: Netflix Inc.
* MFC 272315 272757 274091 274902sbruno2015-02-131-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | for real this time r272315 Explicitly return None for negative event indices. Prior to this, eventat(-1) would return the next-to-last event causing the back button to cycle back to the end of an event source instead of stopping at the start. r272757 Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph. r274091 Bind Ctrl-Q as a global hotkey to exit. Bind Ctrl-W as a hotkey to close dialogs. r274902 Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives. Reviewed by: jhb
* Revert r278650. Definite layer 8 bug.sbruno2015-02-131-10/+4
| | | | Submitted by: dhw and Thomas Mueller <tmueller@sysgo.com>
* MFC 272315 272757 274091 274902sbruno2015-02-131-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | r272315 Explicitly return None for negative event indices. Prior to this, eventat(-1) would return the next-to-last event causing the back button to cycle back to the end of an event source instead of stopping at the start. r272757 Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph. r274091 Bind Ctrl-Q as a global hotkey to exit. Bind Ctrl-W as a hotkey to close dialogs. r274902 Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives. Reviewed by: jhb
* MFC r258622: dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEavg2014-01-171-4/+4
|
* Make the callout arithmetic more robust adding checks for overflow.davide2013-09-261-1/+6
| | | | | | | | | | | | | | | | Without these, if the timeout value passed is "large enough", the value of the sum of it and other factors (e.g. current time as returned by sbinuptime() or 'precision' argument) might result in a negative number. This negative number is then passed to eventtimers(4), which causes et_start() routine to load et_min_period into eventtimer, making the CPU where the thread is stuck forever in timer interrupt handler routine. This is now avoided rounding to INT64_MAX the timeout period in case of overflow. Reported by: kib, pho Discussed with: kib, mav Tested by: pho (stress2 suite, kevent7.sh scenario) Approved by: re (kib)
* Fix callout_init_rm() in the shared case, allocating storage for 'structdavide2013-09-201-3/+11
| | | | | | | | | | | rm_priotracker' directly in the softclock thread. Now consumers can pass CALLOUT_SHAREDLOCK flag to callout initialization routine safely. The choice of the already existing flags instead of special casing shared rmlocks is done to prevent consumer footshooting. Suggested by: jhb Reviewed by: jhb Approved by: re (delphij)
* Specify SDT probe argument types in the probe definition itself rather thanmarkj2013-08-151-4/+2
| | | | | | | | | using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks
* Cache the callout precision argument as part of the informations requireddavide2013-03-251-2/+9
| | | | | | | | for migrating callouts to new CPU. This value is passed to callout_cc_add() in order to update properly precision field in case of rescheduling/migration. Reviewed by: mav
* Bring back the comment on the sizing of the callout array that gotandre2013-03-101-0/+2
| | | | | | lost in r248031. Requested by: alc, alfred
* Fixup r248032:davide2013-03-091-1/+1
| | | | | | | | Change size requested to malloc(9) now that callwheel buckets are callout_list and not callout_tailq anymore. This change was already there but it seems it got lost after code churn in r248032. Reported by: alc, kib
* Move the callout subsystem initialization to its own SYSINIT()andre2013-03-081-34/+25
| | | | | | | | | | | | | | | from being indirectly called via cpu_startup()+vm_ksubmap_init(). The boot order position remains the same at SI_SUB_CPU. Allocation of the callout array is changed to stardard kernel malloc from a slightly obscure direct kernel_map allocation. kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init() to better describe its purpose. kern_timeout_callwheel_init() is removed simplifying the per-cpu initialization. Reviewed by: davide
* Move the auto-sizing of the callout array from init_param2() toandre2013-03-081-0/+14
| | | | | | | | | | | | kern_timeout_callwheel_alloc() where it is actually used. This is a mechanical move and no tuning parameters are changed. The pre-allocated callout array is only used for legacy timeout(9) calls and is only allocated and active on cpu0. Eventually all remaining users of timeout(9) should switch to the callout_* API. Reviewed by: davide
* Complete r247813:davide2013-03-041-8/+8
| | | | | | | Use true/false instead of TRUE/FALSE. Reported by: attilio Requested by: jhb
* Use C99 'bool' rather than Machish 'boolean_t'.davide2013-03-041-2/+2
| | | | Requested by: jhb
* Fix build with DIAGNOSTIC/CALLOUT_PROFILING options turned on.davide2013-03-041-9/+9
| | | | | Reported by: kib, David Wolfskill <david at catwhisker dot org> Pointy-hat to: davide
* - Make callout(9) tickless, relying on eventtimers(4) as backend fordavide2013-03-041-238/+522
| | | | | | | | | | | | | | | | | | | | | | | | | | | precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil
* callwheelmask and callwheelsize are always greater than zero.davide2013-03-031-1/+1
| | | | Switch their type to u_int.
* Remove a couple of unused include.davide2013-03-031-1/+0
|
* MFcalloutng:mav2013-03-031-2/+2
| | | | Some whitespace fixes.
* MFcalloutng:davide2013-02-281-2/+2
| | | | Style fixes.
* Fixup r243901:attilio2012-12-051-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | - As the comment report, CALLOUT_LOCAL_ALLOC cannot be checked directly from the callout flags but might be checked by a cached value. Hence, do so before to actually remove the callout, when needed, in softclock_call_cc(). - In softclock_call_cc() also add a comment in the waiting and deferred migration case explaining that the dereference should be safe because of the migration dereference invariants. Additively: - In softclock_call_cc(), for the deferred migration case, move all the accesses to callout structure after the comment stating the callout must not be destroyed. - For consistency with this last tweak, use cached c_flags for the KASSERT() in the deferred migration case. It is not strictly necessary but this way all the callout accesses happen after the above mentioned comment, improving consistency. Pointy hat to: me Sponsored by: Isilon Systems / EMC Corporation Reviewed by: kib MFC after: 2 weeks X-MFC: 243901
* The softclock_call_cc() is executing with the callout already removedkib2012-12-051-29/+32
| | | | | | | | | | | | | | | | | | | | | | | from the callwheel. Calculate the cc->cc_next before removing the callout, otherwise the code followed the invalid tailq links. After this, make softclock_call_cc() return void, since it always return cc->cc_next, which is immediately available to the softclock() anyway. This also allows to eliminate a label under #ifdef SMP. Remove the assignment of cc->cc_next from callout_cc_del(), since the function is called with the callout already removed from callwheel. If cancelling the migration, also clear the CALLOUT_DFRMIGRATION flag. Postpone the free of the timeout(9) allocated callouts after the migration checks are done. Add some more strict asserts about the state of the callout in callout_call_cc(). Reviewed by: attilio Reported and tested by: pho (previous version) MFC after: 2 weeks
* replace bit shifting loop with 1<<fls(n), improve comments.alfred2012-12-041-6/+4
| | | | Reviewed by: davide
* Rework the known mutexes to benefit about staying on their ownattilio2012-10-311-2/+2
| | | | | | | | | | | cache line in order to avoid manual frobbing but using struct mtx_padalign. The sole exception being nvme and sxfge drivers, where the author redefined CACHE_LINE_SIZE manually, so they need to be analyzed and dealt with separately. Reviwed by: jimharris, alc
* Pad and align the callout_cpu mtx to its own cacheline to reduce falsejimharris2012-10-311-1/+1
| | | | | | | | | | | | sharing especially on the default CPU 0 callout_cpu structure. This will be followed up by attilio@ with a conversion to the new struct mtx_padalign but doing this manual conversion first gives an easy MFC candidate since mtx_padalign is a more extensive system change. Sponsored by: Intel Reviewed by: jeff, attilio MFC after: 1 week
* Move the code to call the callout callback into the helper functionkib2012-05-031-198/+181
| | | | | | | | softclock_call_cc(). While there, move some common code to callout_cc_del(). Requested by: avg, jhb Reviewed by: jhb MFC after: 1 week
* When callout_reset_on() cannot immediately migrate a callout since itkib2012-05-031-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | is running on other cpu, the CALLOUT_PENDING flag is temporarily cleared. Then, callout_stop() on this, in fact active, callout fails because CALLOUT_PENDING is not set, and callout_stop() returns 0. Now, in sleepq_check_timeout(), the failed callout_stop() causes the sleepq code to execute mi_switch() without even setting the wmesg, since the switch-out is supposed to be transient. In fact, the thread is put off the CPU for full timeout interval, instead of being put on runq immediately. Until timeout fires, the process is unkillable for obvious reasons. Fix this by marking the migrating callouts with CALLOUT_DFRMIGRATION flag. The flag is cleared by callout_stop_safe() when the function detects a migration, besides returning the success. The softclock() rechecks the flag for migrating callout and cancels its execution if the flag was cleared meantime. PR: misc/166340 Reported, debugging traces provided and tested by: Christian Esken <christian.esken trivago com> Reviewed by: avg, jhb MFC after: 1 week
* Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.ed2011-11-071-1/+1
| | | | This means that their use is restricted to a single C file.
* callout_cpu_switch() allows preemption when dropping the outcomingattilio2011-08-211-0/+7
| | | | | | | | | | | | | | | | | | callout cpu lock (and after having dropped it). If the newly scheduled thread wants to acquire the old queue it will just spin forever. Fix this by disabling preemption and interrupts entirely (because fast interrupt handlers may incur in the same problem too) while switching locks. Reported by: hrs, Mike Tancsa <mike AT sentex DOT net>, Chip Camden <sterling AT camdensoftware DOT com> Tested by: hrs, Mike Tancsa <mike AT sentex DOT net>, Chip Camden <sterling AT camdensoftware DOT com>, Nicholas Esborn <nick AT desert DOT net> Approved by: re (kib) MFC after: 10 days
* Reintroduce the fix already discussed in r216805 (please check its historyattilio2011-04-081-24/+198
| | | | | | | | | | | | | | | for a detailed explanation of the problems). The only difference with the previous fix is in Solution2: CPUBLOCK is no longer set when exiting from callout_reset_*() functions, which avoid the deadlock (leading to r217161). There is no need to CPUBLOCK there because the running-and-migrating assumption is strong enough to avoid problems there. Furthermore add a better !SMP compliancy (leading to shrinked code and structures) and facility macros/functions. Tested by: gianni, pho, dim MFC after: 3 weeks
* Revert r216805.attilio2011-01-081-119/+23
| | | | | | | | | | That revision is introducing a bug which is more visible than problems it is trying to fix. As long as my time is very limited in this period I am going to commit back this patch just once it is fully fixed. Reported by: dim, Nicholas Esborn
* Fix several callout migration races:attilio2010-12-291-23/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Problem1: Hypothesis: thread1 is doing a callout_reset_on(), within his callout handler, willing to implicitly or explicitly migrate the callout. thread2 is draining the callout. Thesys: * thread1 calls callout_lock() and locks the old callout cpu * thread1 performs the checks in the first path of the callout_reset_on() * thread1 hits this codepiece: /* * If the lock must migrate we have to check the state again as * we can't hold both the new and old locks simultaneously. */ if (c->c_cpu != cpu) { c->c_cpu = cpu; CC_UNLOCK(cc); goto retry; } which means it will drop the lock and 'retry' * thread2 will callout_lock() and locks the new callout cpu. thread1 spins on the new lock and will not keep going for the moment. * thread2 checks that the callout is not pending (as callout is currently running) and that it is not on cc->cc_curr (because cc now refers to the new callout and the callout is running on the old callout cpu) thus it thinks it is done and returns. * thread1 will now acquire the lock and then adds the callout to the new callout cpu queue That seems an obvious race as callout_stop() falsely reports the callout stopped or worse, callout_drain() falsely returns while the callout is still in use. - Solution1: Fixing this problem would require, in general, to lock both callout cpus at once while switching the c_cpu field and avoid cyclic deadlocks between callout cpus locks. The concept of CPUBLOCK is then introduced (working more or less like the blocked_lock for thread_lock() function) meaning: "in callout_lock(), spin until the c->c_cpu is not different from CPUBLOCK". That way the "original" callout cpu, referred to the above mentioned code snippet, will remain blocked until the lock handover is over critical path will remain covered. - Problem2: Having the callout currently executed on a specific callout cpu and contemporary pending on another callout cpu (as it can happen with current code) breaks, at least, the assumption callout_drain() returns just once the callout cannot be referenced anymore. - Solution2: Callout migration is deferred if the current callout is already under execution. The best place to do that is in softclock() and new members are added to the callout cpu structure in order to specify a pending migration is requested. That is necessary because the callout cannot be trusted (not freed) the 100% of times after the execution of the callout handler. CPUBLOCK will prevent, in the "deferred migration" case, that the callout gets freed in this case, stopping any callout_stop() and callout_drain() possible activity until the migration is actually performed. - Problem3: There is a further race in callout_drain(). In order to avoid a race between sleepqueue lock and callout cpu spinlock, in _callout_stop_safe(), the callout cpu lock is dropped, the sleepqueue lock is acquired and a new callout cpu lookup is performed. Note that the channel used for locking the sleepqueue is obtained from the "current" callout cpu (&cc->cc_waiting). If the callout migrated in the meanwhile, callout_drain() will end up using the wrong wchan for the sleepqueue (the locked one will be the older, while the new one will not really be locked) leading to a lock leak and a race access to sleepqueue. - Solution3: It is enough to check if a migration happened between the operation of acquiring the sleepqueue lock and the new callout cpu lock and eventually unwind all those and try again. This problems can lead to deathly races on moderate (4-ways) SMP environment, leading to easy panic or deadlocks. The 24-ways of the reporter, could easilly panic, with completely normal workload, almost daily. gianni@ kindly wrote the following prof-of-concept which can panic a FreeBSD machine in less than one hour, in smaller SMP: http://www.freebsd.org/~attilio/callout/test.c Reported by: Nicholas Esborn <nick at desert dot net>, DesertNet In collabouration with: gianni, pho, Nicholas Esborn Reviewed by: jhb MFC after: 1 week (*) * Usually, I would aim for a larger MFC timeout, but I really want this in before 8.2-RELEASE, thus re@ accepted a shorter timeout as a special case for this patch
* Remove 'softclock_ih' as it is no longer used.jhb2010-11-031-4/+1
|
* Fix callout_tickstofirst() behavior after signed integer ticks overflow.mav2010-10-311-2/+1
| | | | | | | This should fix callout precision drop to 1/4s after 25 days of uptime with HZ = 1000. Submitted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
* Fix panic on NULL dereference possible after r212541.mav2010-09-141-1/+2
|
* Make kern_tc.c provide minimum frequency of tc_ticktock() calls, requiredmav2010-09-141-2/+2
| | | | | | to handle current timecounter wraps. Make kern_clocksource.c to honor that requirement, scheduling sleeps on first CPU for no more then specified period. Allow other CPUs to sleep up to 1/4 second (for any case).
* Refactor timer management code with priority to one-shot operation mode.mav2010-09-131-2/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of this is to generate timer interrupts only when there is some work to do. When CPU is busy interrupts are generating at full rate of hz + stathz to fullfill scheduler and timekeeping requirements. But when CPU is idle, only minimum set of interrupts (down to 8 interrupts per second per CPU now), needed to handle scheduled callouts is executed. This allows significantly increase idle CPU sleep time, increasing effect of static power-saving technologies. Also it should reduce host CPU load on virtualized systems, when guest system is idle. There is set of tunables, also available as writable sysctls, allowing to control wanted event timer subsystem behavior: kern.eventtimer.timer - allows to choose event timer hardware to use. On x86 there is up to 4 different kinds of timers. Depending on whether chosen timer is per-CPU, behavior of other options slightly differs. kern.eventtimer.periodic - allows to choose periodic and one-shot operation mode. In periodic mode, current timer hardware taken as the only source of time for time events. This mode is quite alike to previous kernel behavior. One-shot mode instead uses currently selected time counter hardware to schedule all needed events one by one and program timer to generate interrupt exactly in specified time. Default value depends of chosen timer capabilities, but one-shot mode is preferred, until other is forced by user or hardware. kern.eventtimer.singlemul - in periodic mode specifies how much times higher timer frequency should be, to not strictly alias hardclock() and statclock() events. Default values are 2 and 4, but could be reduced to 1 if extra interrupts are unwanted. kern.eventtimer.idletick - makes each CPU to receive every timer interrupt independently of whether they busy or not. By default this options is disabled. If chosen timer is per-CPU and runs in periodic mode, this option has no effect - all interrupts are generating. As soon as this patch modifies cpu_idle() on some platforms, I have also refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions (if supported) under high sleep/wakeup rate, as fast alternative to other methods. It allows SMP scheduler to wake up sleeping CPUs much faster without using IPI, significantly increasing performance on some highly task-switching loads. Tested by: many (on i386, amd64, sparc64 and powerc) H/W donated by: Gheorghe Ardelean Sponsored by: iXsystems, Inc.
* Add an extra comment to the SDT probes definition. This allows us to getrpaulo2010-08-221-2/+2
| | | | | | | | | use '-' in probe names, matching the probe names in Solaris.[1] Add userland SDT probes definitions to sys/sdt.h. Sponsored by: The FreeBSD Foundation Discussed with: rwaston [1]
* Update several places that iterate over CPUs to use CPU_FOREACH().jhb2010-06-111-3/+1
|
OpenPOWER on IntegriCloud