op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'sched/clock' into sched/devel	Ingo Molnar	2008-07-14	1	-19/+105
\|\
\| *	sched_clock: and multiplier for TSC to gtod drift	Steven Rostedt	2008-07-11	1	-3/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sched_clock code currently tries to keep all CPU clocks of all CPUS somewhat in sync. At every clock tick it records the gtod clock and uses that and jiffies and the TSC to calculate a CPU clock that tries to stay in sync with all the other CPUs. ftrace depends heavily on this timer and it detects when this timer "jumps". One problem is that the TSC and the gtod also drift. When the TSC is 0.1% faster or slower than the gtod it is very noticeable in ftrace. To help compensate for this, I've added a multiplier that tries to keep the CPU clock updating at the same rate as the gtod. I've tried various ways to get it to be in sync and this ended up being the most reliable. At every scheduler tick we calculate the new multiplier: multi = delta_gtod / delta_TSC This means we perform a 64 bit divide at the tick (once a HZ). A shift is used to handle the accuracy. Other methods that failed due to dynamic HZ are: (not used) multi += (gtod - tsc) / delta_gtod (not used) multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod as well as other variants. This code still allows for a slight drift between TSC and gtod, but it keeps the damage down to a minimum. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: record TSC after gtod	Steven Rostedt	2008-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To read the gtod we need to grab the xtime lock for read. Reading the gtod before the TSC can cause a bigger gab if the xtime lock is contended. This patch simply reverses the order to read the TSC after the gtod. The locking in the reading of the gtod handles any barriers one might think is needed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: only update deltas with local reads.	Steven Rostedt	2008-07-11	1	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reading the CPU clock should try to stay accurate within the CPU. By reading the CPU clock from another CPU and updating the deltas can cause unneeded jumps when reading from the local CPU. This patch changes the code to update the last read TSC only when read from the local CPU. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: fix calculation of other CPU	Steven Rostedt	2008-07-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The algorithm to calculate the 'now' of another CPU is not correct. At each scheduler tick, each CPU records the last sched_clock and gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the same in speed between two clocks the algorithm would be: tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2) To calculate now2 we would have: now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1 Currently the algorithm is: now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1 This solves most of the rest of the issues I've had with timestamps in ftace. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: stop maximum check on NO HZ	Steven Rostedt	2008-07-11	1	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Working with ftrace I would get large jumps of 11 millisecs or more with the clock tracer. This killed the latencing timings of ftrace and also caused the irqoff self tests to fail. What was happening is with NO_HZ the idle would stop the jiffy counter and before the jiffy counter was updated the sched_clock would have a bad delta jiffies to compare with the gtod with the maximum. The jiffies would stop and the last sched_tick would record the last gtod. On wakeup, the sched clock update would compare the gtod + delta jiffies (which would be zero) and compare it to the TSC. The TSC would have correctly (with a stable TSC) moved forward several jiffies. But because the jiffies has not been updated yet the clock would be prevented from moving forward because it would appear that the TSC jumped too far ahead. The clock would then virtually stop, until the jiffies are updated. Then the next sched clock update would see that the clock was very much behind since the delta jiffies is now correct. This would then jump the clock forward by several jiffies. This caused ftrace to report several milliseconds of interrupts off latency at every resume from NO_HZ idle. This patch adds hooks into the nohz code to disable the checking of the maximum clock update when nohz is in effect. It resumes the max check when nohz has updated the jiffies again. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: widen the max and min time	Steven Rostedt	2008-07-11	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With keeping the max and min sched time within one jiffy of the gtod clock was too tight. Just before a schedule tick the max could easily be hit, as well as just after a schedule_tick the min could be hit. This caused the clock to jump around by a jiffy. This patch widens the minimum to last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS and the maximum to last gtod + (2 + delta_jiffies) * TICK_NSECS This keeps the minum to gtod or if one jiffy less than delta jiffies and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be a bit more sporadic, but it helps keep a clock with a stable TSC working well. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| *	sched_clock: record from last tick	Steven Rostedt	2008-07-11	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sched_clock code tries to keep within the gtod time by one tick (jiffy). The current code mistakenly keeps track of the delta jiffies between updates of the clock, where the the delta is used to compare with the number of jiffies that have past since an update of the gtod. The gtod is updated at each schedule tick not each sched_clock update. After one jiffy passes the clock is updated fine. But the delta is taken from the last update so if the next update happens before the next tick the delta jiffies used will be incorrect. This patch changes the code to check the delta of jiffies between ticks and not updates to match the comparison of the updates with the gtod. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	sched: sched_clock_cpu() based cpu_clock(), lockdep fix	Ingo Molnar	2008-06-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Vegard Nossum reported: > WARNING: at kernel/lockdep.c:2738 check_flags+0x142/0x160() which happens due to: unsigned long long cpu_clock(int cpu) { unsigned long long clock; unsigned long flags; raw_local_irq_save(flags); as lower level functions can take locks, we must not do that, use proper lockdep-annotated irq save/restore. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	sched: export cpu_clock	Ingo Molnar	2008-06-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	the rcutorture module relies on cpu_clock. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \|	sched: sched_clock_cpu() based cpu_clock()	Peter Zijlstra	2008-06-27	1	-0/+12
\|/ \| \| \| \| \| \| \| \| \|	with sched_clock_cpu() being reasonably in sync between cpus (max 1 jiffy difference) use this to provide cpu_clock(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix sched_clock_cpu()	Peter Zijlstra	2008-05-29	1	-4/+14
\| \| \| \| \| \| \| \| \|	Make sched_clock_cpu() return 0 before it has been initialized and avoid corrupting its state due to doing so. This fixes the weird printk timestamp jump reported. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
*	sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK	Peter Zijlstra	2008-05-05	1	-0/+236
	this replaces the rq->clock stuff (and possibly cpu_clock()). - architectures that have an 'imperfect' hardware clock can set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK - the 'jiffie' window might be superfulous when we update tick_gtod before the __update_sched_clock() call in sched_clock_tick() - cpu_clock() might be implemented as: sched_clock_cpu(smp_processor_id()) if the accuracy proves good enough - how far can TSC drift in a single jiffie when considering the filtering and idle hooks? [ mingo@elte.hu: various fixes and cleanups ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>