From 47001d603375f857a7fab0e9c095d964a1ea0039 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 1 Apr 2008 19:45:18 +0200 Subject: x86: tsc prevent time going backwards We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar --- arch/x86/kernel/tsc_32.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'arch/x86/kernel/tsc_32.c') diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index f14cfd9..d7498b3 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -287,14 +287,27 @@ core_initcall(cpufreq_tsc); /* clock source code */ static unsigned long current_tsc_khz = 0; +static struct clocksource clocksource_tsc; +/* + * We compare the TSC to the cycle_last value in the clocksource + * structure to avoid a nasty time-warp issue. This can be observed in + * a very small window right after one CPU updated cycle_last under + * xtime lock and the other CPU reads a TSC value which is smaller + * than the cycle_last reference value due to a TSC which is slighty + * behind. This delta is nowhere else observable, but in that case it + * results in a forward time jump in the range of hours due to the + * unsigned delta calculation of the time keeping core code, which is + * necessary to support wrapping clocksources like pm timer. + */ static cycle_t read_tsc(void) { cycle_t ret; rdtscll(ret); - return ret; + return ret >= clocksource_tsc.cycle_last ? + ret : clocksource_tsc.cycle_last; } static struct clocksource clocksource_tsc = { -- cgit v1.1 From 5b13d863573e746739ccfc24ac1a9473cfee8df1 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 7 Apr 2008 20:58:08 +0200 Subject: revert "x86: tsc prevent time going backwards" revert: | commit 47001d603375f857a7fab0e9c095d964a1ea0039 | Author: Thomas Gleixner | Date: Tue Apr 1 19:45:18 2008 +0200 | | x86: tsc prevent time going backwards it has been identified to cause suspend regression - and the commit fixes a longstanding bug that existed before 2.6.25 was opened - so it can wait some more until the effects are better understood. Signed-off-by: Ingo Molnar --- arch/x86/kernel/tsc_32.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) (limited to 'arch/x86/kernel/tsc_32.c') diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index d7498b3..f14cfd9 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -287,27 +287,14 @@ core_initcall(cpufreq_tsc); /* clock source code */ static unsigned long current_tsc_khz = 0; -static struct clocksource clocksource_tsc; -/* - * We compare the TSC to the cycle_last value in the clocksource - * structure to avoid a nasty time-warp issue. This can be observed in - * a very small window right after one CPU updated cycle_last under - * xtime lock and the other CPU reads a TSC value which is smaller - * than the cycle_last reference value due to a TSC which is slighty - * behind. This delta is nowhere else observable, but in that case it - * results in a forward time jump in the range of hours due to the - * unsigned delta calculation of the time keeping core code, which is - * necessary to support wrapping clocksources like pm timer. - */ static cycle_t read_tsc(void) { cycle_t ret; rdtscll(ret); - return ret >= clocksource_tsc.cycle_last ? - ret : clocksource_tsc.cycle_last; + return ret; } static struct clocksource clocksource_tsc = { -- cgit v1.1 From 4f41c94d5c24e3b3453e9df03c0a80ca1acf00d2 Mon Sep 17 00:00:00 2001 From: Karsten Wiese Date: Mon, 7 Apr 2008 12:14:45 +0200 Subject: x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier() In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it instead of smp_processor_id() in the call to set_cyc2ns_scale(). This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update the intended cpu's cyc2ns. Related mail/thread: http://lkml.org/lkml/2007/12/7/130 Signed-off-by: Karsten Wiese Signed-off-by: Ingo Molnar --- arch/x86/kernel/tsc_32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'arch/x86/kernel/tsc_32.c') diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index f14cfd9..c2241e0 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -256,9 +256,7 @@ time_cpufreq_notifier(struct notifier_block *nb, unsigned long val, void *data) ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - preempt_disable(); - set_cyc2ns_scale(cpu_khz, smp_processor_id()); - preempt_enable(); + set_cyc2ns_scale(cpu_khz, freq->cpu); /* * TSC based sched_clock turns * to junk w/ cpufreq -- cgit v1.1