From fbc78b07ba53ace155f27491c81a009e541a93ad Mon Sep 17 00:00:00 2001 From: Philippe Gerum Date: Thu, 12 Feb 2009 12:18:46 +0000 Subject: powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit. Signed-off-by: Philippe Gerum Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/include/asm/pgtable-4k.h | 2 +- arch/powerpc/include/asm/pgtable-64k.h | 2 +- arch/powerpc/include/asm/pgtable-ppc32.h | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/pgtable-4k.h b/arch/powerpc/include/asm/pgtable-4k.h index 6b18ba9..1dbca4e7 100644 --- a/arch/powerpc/include/asm/pgtable-4k.h +++ b/arch/powerpc/include/asm/pgtable-4k.h @@ -60,7 +60,7 @@ /* It should be preserving the high 48 bits and then specifically */ /* preserving _PAGE_SECONDARY | _PAGE_GROUP_IX */ #define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | \ - _PAGE_HPTEFLAGS) + _PAGE_HPTEFLAGS | _PAGE_SPECIAL) /* Bits to mask out from a PMD to get to the PTE page */ #define PMD_MASKED_BITS 0 diff --git a/arch/powerpc/include/asm/pgtable-64k.h b/arch/powerpc/include/asm/pgtable-64k.h index 07b0d8f..7389003 100644 --- a/arch/powerpc/include/asm/pgtable-64k.h +++ b/arch/powerpc/include/asm/pgtable-64k.h @@ -114,7 +114,7 @@ static inline struct subpage_prot_table *pgd_subpage_prot(pgd_t *pgd) * pgprot changes */ #define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \ - _PAGE_ACCESSED) + _PAGE_ACCESSED | _PAGE_SPECIAL) /* Bits to mask out from a PMD to get to the PTE page */ #define PMD_MASKED_BITS 0x1ff diff --git a/arch/powerpc/include/asm/pgtable-ppc32.h b/arch/powerpc/include/asm/pgtable-ppc32.h index f69a4d9..820b5f0 100644 --- a/arch/powerpc/include/asm/pgtable-ppc32.h +++ b/arch/powerpc/include/asm/pgtable-ppc32.h @@ -429,7 +429,8 @@ extern int icache_44x_need_flush; #define PMD_PAGE_SIZE(pmd) bad_call_to_PMD_PAGE_SIZE() #endif -#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY) +#define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_ACCESSED | _PAGE_DIRTY | \ + _PAGE_SPECIAL) #define PAGE_PROT_BITS (_PAGE_GUARDED | _PAGE_COHERENT | _PAGE_NO_CACHE | \ -- cgit v1.1 From 06eccea6c34e00c8fedce49229ed35fa84619fb7 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 12 Feb 2009 12:36:04 +0000 Subject: powerpc/mm: Fix numa reserve bootmem page selection Fix the powerpc NUMA reserve bootmem page selection logic. commit 8f64e1f2d1e09267ac926e15090fd505c1c0cbcb (powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes) changed the logic for how the powerpc LMB reserved regions were converted to bootmen reserved regions. As the folowing discussion reports, the new logic was not correct. mark_reserved_regions_for_nid() goes through each LMB on the system that specifies a reserved area. It searches for active regions that intersect with that LMB and are on the specified node. It attempts to bootmem-reserve only the area where the active region and the reserved LMB intersect. We can not reserve things on other nodes as they may not have bootmem structures allocated, yet. We base the size of the bootmem reservation on two possible things. Normally, we just make the reservation start and stop exactly at the start and end of the LMB. However, the LMB reservations are not aware of NUMA nodes and on occasion a single LMB may cross into several adjacent active regions. Those may even be on different NUMA nodes and will require separate calls to the bootmem reserve functions. So, the bootmem reservation must be trimmed to fit inside the current active region. That's all fine and dandy, but we trim the reservation in a page-aligned fashion. That's bad because we start the reservation at a non-page-aligned address: physbase. The reservation may only span 2 bytes, but that those bytes may span two pfns and cause a reserve_size of 2*PAGE_SIZE. Take the case where you reserve 0x2 bytes at 0x0fff and where the active region ends at 0x1000. You'll jump into that if() statment, but node_ar.end_pfn=0x1 and start_pfn=0x0. You'll end up with a reserve_size=0x1000, and then call reserve_bootmem_node(node, physbase=0xfff, size=0x1000); 0x1000 may not be on the same node as 0xfff. Oops. In almost all the vm code, end_ is not inclusive. If you have an end_pfn of 0x1234, page 0x1234 is not included in the range. Using PFN_UP instead of the (>> >> PAGE_SHIFT) will make this consistent with the other VM code. We also need to do math for the reserved size with physbase instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is *precisely* the end of the node. However, (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning of the reserved area. That is, of course, physbase. If we don't use physbase here, the reserve_size can be made too large. From: Dave Hansen Tested-by: Geoff Levand Tested on PS3. Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/mm/numa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 7393bd7..5ac08b8 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -882,7 +883,7 @@ static void mark_reserved_regions_for_nid(int nid) unsigned long physbase = lmb.reserved.region[i].base; unsigned long size = lmb.reserved.region[i].size; unsigned long start_pfn = physbase >> PAGE_SHIFT; - unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT); + unsigned long end_pfn = PFN_UP(physbase + size); struct node_active_region node_ar; unsigned long node_end_pfn = node->node_start_pfn + node->node_spanned_pages; @@ -908,7 +909,7 @@ static void mark_reserved_regions_for_nid(int nid) */ if (end_pfn > node_ar.end_pfn) reserve_size = (node_ar.end_pfn << PAGE_SHIFT) - - (start_pfn << PAGE_SHIFT); + - physbase; /* * Only worry about *this* node, others may not * yet have valid NODE_DATA(). -- cgit v1.1 From 0047656e2a97d4452dd7df9e52591a7afe21a263 Mon Sep 17 00:00:00 2001 From: Geoff Levand Date: Thu, 12 Feb 2009 12:36:16 +0000 Subject: powerpc/ps3: Move ps3_mm_add_memory to device_initcall Change the PS3 hotplug memory routine ps3_mm_add_memory() from a core_initcall to a device_initcall. core_initcall routines run before the powerpc topology_init() startup routine, which is a subsys_initcall, resulting in failure of ps3_mm_add_memory() when CONFIG_NUMA=y. When ps3_mm_add_memory() fails the system will boot with just the 128 MiB of boot memory Signed-off-by: Geoff Levand Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/platforms/ps3/mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c index 67de6bf3..d281cc0 100644 --- a/arch/powerpc/platforms/ps3/mm.c +++ b/arch/powerpc/platforms/ps3/mm.c @@ -328,7 +328,7 @@ static int __init ps3_mm_add_memory(void) return result; } -core_initcall(ps3_mm_add_memory); +device_initcall(ps3_mm_add_memory); /*============================================================================*/ /* dma routines */ -- cgit v1.1 From 26456dcfb8d8e43b1b64b2a14710694cf7a72f05 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Thu, 12 Feb 2009 19:08:58 +0000 Subject: powerpc/vsx: Fix VSX alignment handler for regs 32-63 Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored in the VMX part of the thread_struct not the FPR part. Signed-off-by: Michael Neuling CC: stable@kernel.org (2.6.27 & .28 please) Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/kernel/align.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 5af4e9b..ada0692 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -646,11 +646,16 @@ static int emulate_vsx(unsigned char __user *addr, unsigned int reg, unsigned int areg, struct pt_regs *regs, unsigned int flags, unsigned int length) { - char *ptr = (char *) ¤t->thread.TS_FPR(reg); + char *ptr; int ret = 0; flush_vsx_to_thread(current); + if (reg < 32) + ptr = (char *) ¤t->thread.TS_FPR(reg); + else + ptr = (char *) ¤t->thread.vr[reg - 32]; + if (flags & ST) ret = __copy_to_user(addr, ptr, length); else { -- cgit v1.1 From 9132f1b453924e7595ce2dc1853704b2a31f42de Mon Sep 17 00:00:00 2001 From: Russell King Date: Sat, 14 Feb 2009 13:24:10 +0000 Subject: [ARM] omap: fix omap2_divisor_to_clksel() error return value The error checks for omap2_divisor_to_clksel() and comment disagree with the actual value returned on error. Fix this to return the correct error value. Signed-off-by: Russell King --- arch/arm/mach-omap2/clock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c index ad721e0..2899cba 100644 --- a/arch/arm/mach-omap2/clock.c +++ b/arch/arm/mach-omap2/clock.c @@ -565,7 +565,7 @@ u32 omap2_clksel_to_divisor(struct clk *clk, u32 field_val) * * Given a struct clk of a rate-selectable clksel clock, and a clock divisor, * find the corresponding register field value. The return register value is - * the value before left-shifting. Returns 0xffffffff on error + * the value before left-shifting. Returns ~0 on error */ u32 omap2_divisor_to_clksel(struct clk *clk, u32 div) { @@ -577,7 +577,7 @@ u32 omap2_divisor_to_clksel(struct clk *clk, u32 div) clks = omap2_get_clksel_by_parent(clk, clk->parent); if (clks == NULL) - return 0; + return ~0; for (clkr = clks->rates; clkr->div; clkr++) { if ((clkr->flags & cpu_mask) && (clkr->div == div)) @@ -588,7 +588,7 @@ u32 omap2_divisor_to_clksel(struct clk *clk, u32 div) printk(KERN_ERR "clock: Could not find divisor %d for " "clock %s parent %s\n", div, clk->name, clk->parent->name); - return 0; + return ~0; } return clkr->val; -- cgit v1.1 From abf239657b88fa7e75d5b44a65a4177e7bb8acce Mon Sep 17 00:00:00 2001 From: Russell King Date: Sat, 14 Feb 2009 13:25:38 +0000 Subject: [ARM] omap: fix _omap2_clksel_get_src_field() _omap2_clksel_get_src_field() was returning the first entry which was either the default _or_ applicable to the SoC. This is wrong - we should be returning the first default which is applicable to the SoC. Signed-off-by: Russell King --- arch/arm/mach-omap2/clock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c index 2899cba..e64549e 100644 --- a/arch/arm/mach-omap2/clock.c +++ b/arch/arm/mach-omap2/clock.c @@ -708,7 +708,7 @@ static u32 omap2_clksel_get_src_field(void __iomem **src_addr, return 0; for (clkr = clks->rates; clkr->div; clkr++) { - if (clkr->flags & (cpu_mask | DEFAULT_RATE)) + if (clkr->flags & cpu_mask && clkr->flags & DEFAULT_RATE) break; /* Found the default rate for this platform */ } -- cgit v1.1 From 2af29b78618ac8b3a8746337002f108f8fdf56ad Mon Sep 17 00:00:00 2001 From: Andrew Victor Date: Wed, 11 Feb 2009 21:23:10 +0100 Subject: [ARM] 5390/1: AT91: Watchdog fixes The recently merged AT91SAM9 watchdog driver uses the AT91SAM9X_WATCHDOG config variable, whereas the original version of the driver (and the platform support code) used AT91SAM9_WATCHDOG. This causes the watchdog platform_device to never be registered, and therefore the driver not to be initialized. This patch: - updates the platform support code to use AT91SAM9X_WATCHDOG. - includes to fix compile error (same fix as was applied to at91rm9200_wdt.c) - fixes comment regarding watchdog clock-rates in at91rm9200. Signed-off-by: Andrew Victor Signed-off-by: Russell King --- arch/arm/configs/at91sam9260ek_defconfig | 2 +- arch/arm/configs/at91sam9261ek_defconfig | 2 +- arch/arm/configs/at91sam9263ek_defconfig | 2 +- arch/arm/configs/at91sam9rlek_defconfig | 2 +- arch/arm/configs/qil-a9260_defconfig | 2 +- arch/arm/mach-at91/at91cap9_devices.c | 2 +- arch/arm/mach-at91/at91sam9260_devices.c | 2 +- arch/arm/mach-at91/at91sam9261_devices.c | 2 +- arch/arm/mach-at91/at91sam9263_devices.c | 2 +- arch/arm/mach-at91/at91sam9rl_devices.c | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/arm/configs/at91sam9260ek_defconfig b/arch/arm/configs/at91sam9260ek_defconfig index e0ee706..98e2f3d 100644 --- a/arch/arm/configs/at91sam9260ek_defconfig +++ b/arch/arm/configs/at91sam9260ek_defconfig @@ -608,7 +608,7 @@ CONFIG_WATCHDOG_NOWAYOUT=y # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -CONFIG_AT91SAM9_WATCHDOG=y +CONFIG_AT91SAM9X_WATCHDOG=y # # USB-based Watchdog Cards diff --git a/arch/arm/configs/at91sam9261ek_defconfig b/arch/arm/configs/at91sam9261ek_defconfig index 01d1ef9..1494561 100644 --- a/arch/arm/configs/at91sam9261ek_defconfig +++ b/arch/arm/configs/at91sam9261ek_defconfig @@ -700,7 +700,7 @@ CONFIG_WATCHDOG_NOWAYOUT=y # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -CONFIG_AT91SAM9_WATCHDOG=y +CONFIG_AT91SAM9X_WATCHDOG=y # # USB-based Watchdog Cards diff --git a/arch/arm/configs/at91sam9263ek_defconfig b/arch/arm/configs/at91sam9263ek_defconfig index 036a126..21599f3 100644 --- a/arch/arm/configs/at91sam9263ek_defconfig +++ b/arch/arm/configs/at91sam9263ek_defconfig @@ -710,7 +710,7 @@ CONFIG_WATCHDOG_NOWAYOUT=y # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -CONFIG_AT91SAM9_WATCHDOG=y +CONFIG_AT91SAM9X_WATCHDOG=y # # USB-based Watchdog Cards diff --git a/arch/arm/configs/at91sam9rlek_defconfig b/arch/arm/configs/at91sam9rlek_defconfig index 237a2a6..e2df81a 100644 --- a/arch/arm/configs/at91sam9rlek_defconfig +++ b/arch/arm/configs/at91sam9rlek_defconfig @@ -606,7 +606,7 @@ CONFIG_WATCHDOG_NOWAYOUT=y # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -CONFIG_AT91SAM9_WATCHDOG=y +CONFIG_AT91SAM9X_WATCHDOG=y # # Sonics Silicon Backplane diff --git a/arch/arm/configs/qil-a9260_defconfig b/arch/arm/configs/qil-a9260_defconfig index cd1d717..9b32d0e 100644 --- a/arch/arm/configs/qil-a9260_defconfig +++ b/arch/arm/configs/qil-a9260_defconfig @@ -727,7 +727,7 @@ CONFIG_WATCHDOG_NOWAYOUT=y # Watchdog Device Drivers # # CONFIG_SOFT_WATCHDOG is not set -# CONFIG_AT91SAM9_WATCHDOG is not set +# CONFIG_AT91SAM9X_WATCHDOG is not set # # USB-based Watchdog Cards diff --git a/arch/arm/mach-at91/at91cap9_devices.c b/arch/arm/mach-at91/at91cap9_devices.c index 9eca220..412aa49 100644 --- a/arch/arm/mach-at91/at91cap9_devices.c +++ b/arch/arm/mach-at91/at91cap9_devices.c @@ -697,7 +697,7 @@ static void __init at91_add_device_rtt(void) * Watchdog * -------------------------------------------------------------------- */ -#if defined(CONFIG_AT91SAM9_WATCHDOG) || defined(CONFIG_AT91SAM9_WATCHDOG_MODULE) +#if defined(CONFIG_AT91SAM9X_WATCHDOG) || defined(CONFIG_AT91SAM9X_WATCHDOG_MODULE) static struct platform_device at91cap9_wdt_device = { .name = "at91_wdt", .id = -1, diff --git a/arch/arm/mach-at91/at91sam9260_devices.c b/arch/arm/mach-at91/at91sam9260_devices.c index fdde1ea..d74c9ac 100644 --- a/arch/arm/mach-at91/at91sam9260_devices.c +++ b/arch/arm/mach-at91/at91sam9260_devices.c @@ -643,7 +643,7 @@ static void __init at91_add_device_rtt(void) * Watchdog * -------------------------------------------------------------------- */ -#if defined(CONFIG_AT91SAM9_WATCHDOG) || defined(CONFIG_AT91SAM9_WATCHDOG_MODULE) +#if defined(CONFIG_AT91SAM9X_WATCHDOG) || defined(CONFIG_AT91SAM9X_WATCHDOG_MODULE) static struct platform_device at91sam9260_wdt_device = { .name = "at91_wdt", .id = -1, diff --git a/arch/arm/mach-at91/at91sam9261_devices.c b/arch/arm/mach-at91/at91sam9261_devices.c index 1728975..59fc483 100644 --- a/arch/arm/mach-at91/at91sam9261_devices.c +++ b/arch/arm/mach-at91/at91sam9261_devices.c @@ -621,7 +621,7 @@ static void __init at91_add_device_rtt(void) * Watchdog * -------------------------------------------------------------------- */ -#if defined(CONFIG_AT91SAM9_WATCHDOG) || defined(CONFIG_AT91SAM9_WATCHDOG_MODULE) +#if defined(CONFIG_AT91SAM9X_WATCHDOG) || defined(CONFIG_AT91SAM9X_WATCHDOG_MODULE) static struct platform_device at91sam9261_wdt_device = { .name = "at91_wdt", .id = -1, diff --git a/arch/arm/mach-at91/at91sam9263_devices.c b/arch/arm/mach-at91/at91sam9263_devices.c index b753cb8..134af97 100644 --- a/arch/arm/mach-at91/at91sam9263_devices.c +++ b/arch/arm/mach-at91/at91sam9263_devices.c @@ -854,7 +854,7 @@ static void __init at91_add_device_rtt(void) * Watchdog * -------------------------------------------------------------------- */ -#if defined(CONFIG_AT91SAM9_WATCHDOG) || defined(CONFIG_AT91SAM9_WATCHDOG_MODULE) +#if defined(CONFIG_AT91SAM9X_WATCHDOG) || defined(CONFIG_AT91SAM9X_WATCHDOG_MODULE) static struct platform_device at91sam9263_wdt_device = { .name = "at91_wdt", .id = -1, diff --git a/arch/arm/mach-at91/at91sam9rl_devices.c b/arch/arm/mach-at91/at91sam9rl_devices.c index 145324f..7281865 100644 --- a/arch/arm/mach-at91/at91sam9rl_devices.c +++ b/arch/arm/mach-at91/at91sam9rl_devices.c @@ -609,7 +609,7 @@ static void __init at91_add_device_rtt(void) * Watchdog * -------------------------------------------------------------------- */ -#if defined(CONFIG_AT91SAM9_WATCHDOG) || defined(CONFIG_AT91SAM9_WATCHDOG_MODULE) +#if defined(CONFIG_AT91SAM9X_WATCHDOG) || defined(CONFIG_AT91SAM9X_WATCHDOG_MODULE) static struct platform_device at91sam9rl_wdt_device = { .name = "at91_wdt", .id = -1, -- cgit v1.1 From 2b768b6cdbcf7fa0761e6c35c6ea288297582c43 Mon Sep 17 00:00:00 2001 From: Andrew Victor Date: Wed, 11 Feb 2009 21:39:05 +0100 Subject: [ARM] 5391/1: AT91: Enable GPIO clocks earlier Enable the GPIO clocks earlier in the initialization sequence. This allow the board-setup code to read and set GPIO pins. Signed-off-by: Marc Pignat Signed-off-by: Andrew Victor Signed-off-by: Russell King --- arch/arm/mach-at91/gpio.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-at91/gpio.c b/arch/arm/mach-at91/gpio.c index 9b0447c..2f7d497 100644 --- a/arch/arm/mach-at91/gpio.c +++ b/arch/arm/mach-at91/gpio.c @@ -490,7 +490,8 @@ postcore_initcall(at91_gpio_debugfs_init); /*--------------------------------------------------------------------------*/ -/* This lock class tells lockdep that GPIO irqs are in a different +/* + * This lock class tells lockdep that GPIO irqs are in a different * category than their parents, so it won't report false recursion. */ static struct lock_class_key gpio_lock_class; @@ -509,9 +510,6 @@ void __init at91_gpio_irq_setup(void) unsigned id = this->id; unsigned i; - /* enable PIO controller's clock */ - clk_enable(this->clock); - __raw_writel(~0, this->regbase + PIO_IDR); for (i = 0, pin = this->chipbase; i < 32; i++, pin++) { @@ -556,7 +554,14 @@ void __init at91_gpio_init(struct at91_gpio_bank *data, int nr_banks) data->chipbase = PIN_BASE + i * 32; data->regbase = data->offset + (void __iomem *)AT91_VA_BASE_SYS; - /* AT91SAM9263_ID_PIOCDE groups PIOC, PIOD, PIOE */ + /* enable PIO controller's clock */ + clk_enable(data->clock); + + /* + * Some processors share peripheral ID between multiple GPIO banks. + * SAM9263 (PIOC, PIOD, PIOE) + * CAP9 (PIOA, PIOB, PIOC, PIOD) + */ if (last && last->id == data->id) last->next = data; } -- cgit v1.1 From d39123a486524fed9b4e43e08a8757fd90a5859a Mon Sep 17 00:00:00 2001 From: Yang Zhang Date: Thu, 8 Jan 2009 15:13:31 +0800 Subject: KVM: ia64: fix fp fault/trap handler The floating-point registers f6-f11 is used by vmm and saved in kvm-pt-regs, so should set the correct bit mask and the pointer in fp_state, otherwise, fpswa may touch vmm's fp registers instead of guests'. In addition, for fp trap handling, since the instruction which leads to fp trap is completely executed, so can't use retry machanism to re-execute it, because it may pollute some registers. Signed-off-by: Yang Zhang Signed-off-by: Avi Kivity --- arch/ia64/kvm/process.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'arch') diff --git a/arch/ia64/kvm/process.c b/arch/ia64/kvm/process.c index 552d077..230eae4 100644 --- a/arch/ia64/kvm/process.c +++ b/arch/ia64/kvm/process.c @@ -455,13 +455,18 @@ fpswa_ret_t vmm_fp_emulate(int fp_fault, void *bundle, unsigned long *ipsr, if (!vmm_fpswa_interface) return (fpswa_ret_t) {-1, 0, 0, 0}; - /* - * Just let fpswa driver to use hardware fp registers. - * No fp register is valid in memory. - */ memset(&fp_state, 0, sizeof(fp_state_t)); /* + * compute fp_state. only FP registers f6 - f11 are used by the + * vmm, so set those bits in the mask and set the low volatile + * pointer to point to these registers. + */ + fp_state.bitmask_low64 = 0xfc0; /* bit6..bit11 */ + + fp_state.fp_state_low_volatile = (fp_state_low_volatile_t *) ®s->f6; + + /* * unsigned long (*EFI_FPSWA) ( * unsigned long trap_type, * void *Bundle, @@ -545,10 +550,6 @@ void reflect_interruption(u64 ifa, u64 isr, u64 iim, status = vmm_handle_fpu_swa(0, regs, isr); if (!status) return ; - else if (-EAGAIN == status) { - vcpu_decrement_iip(vcpu); - return ; - } break; } -- cgit v1.1 From 7a0eb1960e8ddcb68ea631caf16815485af0e228 Mon Sep 17 00:00:00 2001 From: Avi Kivity Date: Mon, 19 Jan 2009 14:57:52 +0200 Subject: KVM: Avoid using CONFIG_ in userspace visible headers Kconfig symbols are not available in userspace, and are not stripped by headers-install. Avoid their use by adding #defines in to suit each architecture. Signed-off-by: Avi Kivity --- arch/ia64/include/asm/kvm.h | 4 ++++ arch/x86/include/asm/kvm.h | 7 +++++++ 2 files changed, 11 insertions(+) (limited to 'arch') diff --git a/arch/ia64/include/asm/kvm.h b/arch/ia64/include/asm/kvm.h index 68aa6da..bfa86b6 100644 --- a/arch/ia64/include/asm/kvm.h +++ b/arch/ia64/include/asm/kvm.h @@ -25,6 +25,10 @@ #include +/* Select x86 specific features in */ +#define __KVM_HAVE_IOAPIC +#define __KVM_HAVE_DEVICE_ASSIGNMENT + /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index d2e3bf3..886c940 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -9,6 +9,13 @@ #include #include +/* Select x86 specific features in */ +#define __KVM_HAVE_PIT +#define __KVM_HAVE_IOAPIC +#define __KVM_HAVE_DEVICE_ASSIGNMENT +#define __KVM_HAVE_MSI +#define __KVM_HAVE_USER_NMI + /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 -- cgit v1.1 From ad8ba2cd44d4d39fb3fe55d5dcc565b19fc3a7fb Mon Sep 17 00:00:00 2001 From: Sheng Yang Date: Tue, 6 Jan 2009 10:03:02 +0800 Subject: KVM: Add kvm_arch_sync_events to sync with asynchronize events kvm_arch_sync_events is introduced to quiet down all other events may happen contemporary with VM destroy process, like IRQ handler and work struct for assigned device. For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so the state of KVM here is legal and can provide a environment to quiet down other events. Signed-off-by: Sheng Yang Signed-off-by: Avi Kivity --- arch/ia64/kvm/kvm-ia64.c | 4 ++++ arch/powerpc/kvm/powerpc.c | 4 ++++ arch/s390/kvm/kvm-s390.c | 4 ++++ arch/x86/kvm/x86.c | 4 ++++ 4 files changed, 16 insertions(+) (limited to 'arch') diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 4e586f6..28f9820 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1337,6 +1337,10 @@ static void kvm_release_vm_pages(struct kvm *kvm) } } +void kvm_arch_sync_events(struct kvm *kvm) +{ +} + void kvm_arch_destroy_vm(struct kvm *kvm) { kvm_iommu_unmap_guest(kvm); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2822c8c..5f81256 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -125,6 +125,10 @@ static void kvmppc_free_vcpus(struct kvm *kvm) } } +void kvm_arch_sync_events(struct kvm *kvm) +{ +} + void kvm_arch_destroy_vm(struct kvm *kvm) { kvmppc_free_vcpus(kvm); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index be84971..0d33893 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -212,6 +212,10 @@ static void kvm_free_vcpus(struct kvm *kvm) } } +void kvm_arch_sync_events(struct kvm *kvm) +{ +} + void kvm_arch_destroy_vm(struct kvm *kvm) { kvm_free_vcpus(kvm); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc17546..b0fc079 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4127,6 +4127,10 @@ static void kvm_free_vcpus(struct kvm *kvm) } +void kvm_arch_sync_events(struct kvm *kvm) +{ +} + void kvm_arch_destroy_vm(struct kvm *kvm) { kvm_free_all_assigned_devices(kvm); -- cgit v1.1 From ba4cef31d5a397b64ba6d3ff713ce06c62f0c597 Mon Sep 17 00:00:00 2001 From: Sheng Yang Date: Tue, 6 Jan 2009 10:03:03 +0800 Subject: KVM: Fix racy in kvm_free_assigned_irq In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq handler and interrupt_work, in order to prevent cancel_work_sync() in kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done. But it's tricky and still got two problems: 1. A bug ignored two conditions that cancel_work_sync() would return true result in a additional kvm_put_kvm(). 2. If interrupt type is MSI, we would got a window between cancel_work_sync() and free_irq(), which interrupt would be injected again... This patch discard the reference count used for irq handler and interrupt_work, and ensure the legal state by moving the free function at the very beginning of kvm_destroy_vm(). And the patch fix the second bug by disable irq before cancel_work_sync(), which may result in nested disable of irq but OK for we are going to free it. Signed-off-by: Sheng Yang Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b0fc079..fc3e329 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4129,11 +4129,11 @@ static void kvm_free_vcpus(struct kvm *kvm) void kvm_arch_sync_events(struct kvm *kvm) { + kvm_free_all_assigned_devices(kvm); } void kvm_arch_destroy_vm(struct kvm *kvm) { - kvm_free_all_assigned_devices(kvm); kvm_iommu_unmap_guest(kvm); kvm_free_pit(kvm); kfree(kvm->arch.vpic); -- cgit v1.1 From d2a8284e8fca9e2a938bee6cd074064d23864886 Mon Sep 17 00:00:00 2001 From: Marcelo Tosatti Date: Tue, 30 Dec 2008 15:55:05 -0200 Subject: KVM: PIT: fix i8254 pending count read count_load_time assignment is bogus: its supposed to contain what it means, not the expiration time. Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/x86/kvm/i8254.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index e665d1c..72bd275 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -207,7 +207,7 @@ static int __pit_timer_fn(struct kvm_kpit_state *ps) hrtimer_add_expires_ns(&pt->timer, pt->period); pt->scheduled = hrtimer_get_expires_ns(&pt->timer); if (pt->period) - ps->channels[0].count_load_time = hrtimer_get_expires(&pt->timer); + ps->channels[0].count_load_time = ktime_get(); return (pt->period == 0 ? 0 : 1); } -- cgit v1.1 From abe6655dd699069b53bcccbc65b2717f60203b12 Mon Sep 17 00:00:00 2001 From: Marcelo Tosatti Date: Tue, 10 Feb 2009 20:59:45 -0200 Subject: KVM: x86: disable kvmclock on non constant TSC hosts This is better. Currently, this code path is posing us big troubles, and we won't have a decent patch in time. So, temporarily disable it. Signed-off-by: Glauber Costa Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc3e329..758b7a1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -967,7 +967,6 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_MMU_SHADOW_CACHE_CONTROL: case KVM_CAP_SET_TSS_ADDR: case KVM_CAP_EXT_CPUID: - case KVM_CAP_CLOCKSOURCE: case KVM_CAP_PIT: case KVM_CAP_NOP_IO_DELAY: case KVM_CAP_MP_STATE: @@ -992,6 +991,9 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IOMMU: r = iommu_found(); break; + case KVM_CAP_CLOCKSOURCE: + r = boot_cpu_has(X86_FEATURE_CONSTANT_TSC); + break; default: r = 0; break; -- cgit v1.1 From 2aaf69dcee864f4fb6402638dd2f263324ac839f Mon Sep 17 00:00:00 2001 From: Sheng Yang Date: Wed, 21 Jan 2009 16:52:16 +0800 Subject: KVM: MMU: Map device MMIO as UC in EPT Software are not allow to access device MMIO using cacheable memory type, the patch limit MMIO region with UC and WC(guest can select WC using PAT and PCD/PWT). Signed-off-by: Sheng Yang Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 9 +++++++-- arch/x86/kvm/vmx.c | 3 +-- 2 files changed, 8 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 83f11c7..2d4477c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1698,8 +1698,13 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, if (largepage) spte |= PT_PAGE_SIZE_MASK; if (mt_mask) { - mt_mask = get_memory_type(vcpu, gfn) << - kvm_x86_ops->get_mt_mask_shift(); + if (!kvm_is_mmio_pfn(pfn)) { + mt_mask = get_memory_type(vcpu, gfn) << + kvm_x86_ops->get_mt_mask_shift(); + mt_mask |= VMX_EPT_IGMT_BIT; + } else + mt_mask = MTRR_TYPE_UNCACHABLE << + kvm_x86_ops->get_mt_mask_shift(); spte |= mt_mask; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6259d74..07491c9 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3687,8 +3687,7 @@ static int __init vmx_init(void) if (vm_need_ept()) { bypass_guest_pf = 0; kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK | - VMX_EPT_IGMT_BIT); + VMX_EPT_WRITABLE_MASK); kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, VMX_EPT_EXECUTABLE_MASK, VMX_EPT_DEFAULT_MT << VMX_EPT_MT_EPTE_SHIFT); -- cgit v1.1 From b682b814e3cc340f905c14dff87ce8bdba7c5eba Mon Sep 17 00:00:00 2001 From: Marcelo Tosatti Date: Tue, 10 Feb 2009 20:41:41 -0200 Subject: KVM: x86: fix LAPIC pending count calculation Simplify LAPIC TMCCT calculation by using hrtimer provided function to query remaining time until expiration. Fixes host hang with nested ESX. Signed-off-by: Marcelo Tosatti Signed-off-by: Alexander Graf Signed-off-by: Avi Kivity --- arch/x86/kvm/irq.c | 7 ------ arch/x86/kvm/irq.h | 1 - arch/x86/kvm/lapic.c | 66 ++++++++++++---------------------------------------- arch/x86/kvm/lapic.h | 2 -- arch/x86/kvm/svm.c | 1 - arch/x86/kvm/vmx.c | 1 - 6 files changed, 15 insertions(+), 63 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index c019b8e..cf17ed5 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -87,13 +87,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs); -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec) -{ - kvm_apic_timer_intr_post(vcpu, vec); - /* TODO: PIT, RTC etc. */ -} -EXPORT_SYMBOL_GPL(kvm_timer_intr_post); - void __kvm_migrate_timers(struct kvm_vcpu *vcpu) { __kvm_migrate_apic_timer(vcpu); diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 2bf32a0..82579ee 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -89,7 +89,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm) void kvm_pic_reset(struct kvm_kpic_state *s); -void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec); void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index afac68c..f0b67f2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -35,6 +35,12 @@ #include "kvm_cache_regs.h" #include "irq.h" +#ifndef CONFIG_X86_64 +#define mod_64(x, y) ((x) - (y) * div64_u64(x, y)) +#else +#define mod_64(x, y) ((x) % (y)) +#endif + #define PRId64 "d" #define PRIx64 "llx" #define PRIu64 "u" @@ -511,52 +517,22 @@ static void apic_send_ipi(struct kvm_lapic *apic) static u32 apic_get_tmcct(struct kvm_lapic *apic) { - u64 counter_passed; - ktime_t passed, now; + ktime_t remaining; + s64 ns; u32 tmcct; ASSERT(apic != NULL); - now = apic->timer.dev.base->get_time(); - tmcct = apic_get_reg(apic, APIC_TMICT); - /* if initial count is 0, current count should also be 0 */ - if (tmcct == 0) + if (apic_get_reg(apic, APIC_TMICT) == 0) return 0; - if (unlikely(ktime_to_ns(now) <= - ktime_to_ns(apic->timer.last_update))) { - /* Wrap around */ - passed = ktime_add(( { - (ktime_t) { - .tv64 = KTIME_MAX - - (apic->timer.last_update).tv64}; } - ), now); - apic_debug("time elapsed\n"); - } else - passed = ktime_sub(now, apic->timer.last_update); - - counter_passed = div64_u64(ktime_to_ns(passed), - (APIC_BUS_CYCLE_NS * apic->timer.divide_count)); - - if (counter_passed > tmcct) { - if (unlikely(!apic_lvtt_period(apic))) { - /* one-shot timers stick at 0 until reset */ - tmcct = 0; - } else { - /* - * periodic timers reset to APIC_TMICT when they - * hit 0. The while loop simulates this happening N - * times. (counter_passed %= tmcct) would also work, - * but might be slower or not work on 32-bit?? - */ - while (counter_passed > tmcct) - counter_passed -= tmcct; - tmcct -= counter_passed; - } - } else { - tmcct -= counter_passed; - } + remaining = hrtimer_expires_remaining(&apic->timer.dev); + if (ktime_to_ns(remaining) < 0) + remaining = ktime_set(0, 0); + + ns = mod_64(ktime_to_ns(remaining), apic->timer.period); + tmcct = div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->timer.divide_count)); return tmcct; } @@ -653,8 +629,6 @@ static void start_apic_timer(struct kvm_lapic *apic) { ktime_t now = apic->timer.dev.base->get_time(); - apic->timer.last_update = now; - apic->timer.period = apic_get_reg(apic, APIC_TMICT) * APIC_BUS_CYCLE_NS * apic->timer.divide_count; atomic_set(&apic->timer.pending, 0); @@ -1110,16 +1084,6 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu) } } -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec) -{ - struct kvm_lapic *apic = vcpu->arch.apic; - - if (apic && apic_lvt_vector(apic, APIC_LVTT) == vec) - apic->timer.last_update = ktime_add_ns( - apic->timer.last_update, - apic->timer.period); -} - int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) { int vector = kvm_apic_has_interrupt(vcpu); diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 8185888..45ab6ee 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -12,7 +12,6 @@ struct kvm_lapic { atomic_t pending; s64 period; /* unit: ns */ u32 divide_count; - ktime_t last_update; struct hrtimer dev; } timer; struct kvm_vcpu *vcpu; @@ -42,7 +41,6 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data); void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu); int kvm_lapic_enabled(struct kvm_vcpu *vcpu); int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu); -void kvm_apic_timer_intr_post(struct kvm_vcpu *vcpu, int vec); void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1452851..a9e769e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1600,7 +1600,6 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu) /* Okay, we can deliver the interrupt: grab it and update PIC state. */ intr_vector = kvm_cpu_get_interrupt(vcpu); svm_inject_irq(svm, intr_vector); - kvm_timer_intr_post(vcpu, intr_vector); out: update_cr8_intercept(vcpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 07491c9..b1fe142 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3285,7 +3285,6 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu) } if (vcpu->arch.interrupt.pending) { vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr); - kvm_timer_intr_post(vcpu, vcpu->arch.interrupt.nr); if (kvm_cpu_has_interrupt(vcpu)) enable_irq_window(vcpu); } -- cgit v1.1 From 516a1a7e9dc80358030fe01aabb3bedf882db9e2 Mon Sep 17 00:00:00 2001 From: Avi Kivity Date: Sun, 15 Feb 2009 02:32:07 +0200 Subject: KVM: VMX: Flush volatile msrs before emulating rdmsr Some msrs (notable MSR_KERNEL_GS_BASE) are held in the processor registers and need to be flushed to the vcpu struture before they can be read. This fixes cygwin longjmp() failure on Windows x64. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b1fe142..7611af5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -903,6 +903,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) data = vmcs_readl(GUEST_SYSENTER_ESP); break; default: + vmx_load_host_state(to_vmx(vcpu)); msr = find_msr_entry(to_vmx(vcpu), msr_index); if (msr) { data = msr->data; -- cgit v1.1 From 6bc5c366b1a45ca18fba6851f62db5743b3f6db5 Mon Sep 17 00:00:00 2001 From: Pekka Paalanen Date: Sat, 3 Jan 2009 21:23:51 +0200 Subject: trace: mmiotrace to the tracer menu in Kconfig Impact: cosmetic change in Kconfig menu layout This patch was originally suggested by Peter Zijlstra, but seems it was forgotten. CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable directly under the Kernel hacking / debugging menu in the kernel configuration system. They were present only for x86 and x86_64. Other tracers that use the ftrace tracing framework are in their own sub-menu. This patch moves the mmiotrace configuration options there. Since the Kconfig file, where the tracer menu is, is not architecture specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by x86/x86_64. CONFIG_MMIOTRACE now depends on it. Signed-off-by: Pekka Paalanen Signed-off-by: Steven Rostedt Signed-off-by: Ingo Molnar --- arch/x86/Kconfig.debug | 24 ++---------------------- 1 file changed, 2 insertions(+), 22 deletions(-) (limited to 'arch') diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 10d6cc3..e1983fa 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -174,28 +174,8 @@ config IOMMU_LEAK Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. -config MMIOTRACE - bool "Memory mapped IO tracing" - depends on DEBUG_KERNEL && PCI - select TRACING - help - Mmiotrace traces Memory Mapped I/O access and is meant for - debugging and reverse engineering. It is called from the ioremap - implementation and works via page faults. Tracing is disabled by - default and can be enabled at run-time. - - See Documentation/tracers/mmiotrace.txt. - If you are not helping to develop drivers, say N. - -config MMIOTRACE_TEST - tristate "Test module for mmiotrace" - depends on MMIOTRACE && m - help - This is a dumb module for testing mmiotrace. It is very dangerous - as it will write garbage to IO memory starting at a given address. - However, it should be safe to use on e.g. unused portion of VRAM. - - Say N, unless you absolutely know what you are doing. +config HAVE_MMIOTRACE_SUPPORT + def_bool y # # IO delay types: -- cgit v1.1 From a0abd520fd69295f4a3735e29a9448a32e101d47 Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Mon, 16 Feb 2009 17:31:58 -0600 Subject: cpumask: fix powernow-k8: partial revert of 2fdf66b491ac706657946442789ec644cc317e1a Impact: fix powernow-k8 when acpi=off (or other error). There was a spurious change introduced into powernow-k8 in this patch: so that we try to "restore" the cpus_allowed we never saved. We revert that file. See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when acpi=off" from Yinghai for the bug report. Cc: Mike Travis Cc: Yinghai Lu Signed-off-by: Rusty Russell Acked-by: Ingo Molnar --- arch/x86/kernel/cpu/cpufreq/powernow-k8.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c index fb039cd..6428aa1 100644 --- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c +++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c @@ -1157,8 +1157,7 @@ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol) data->cpu = pol->cpu; data->currpstate = HW_PSTATE_INVALID; - rc = powernow_k8_cpu_init_acpi(data); - if (rc) { + if (powernow_k8_cpu_init_acpi(data)) { /* * Use the PSB BIOS structure. This is only availabe on * an UP version, and is deprecated by AMD. @@ -1176,17 +1175,20 @@ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol) "ACPI maintainers and complain to your BIOS " "vendor.\n"); #endif - goto err_out; + kfree(data); + return -ENODEV; } if (pol->cpu != 0) { printk(KERN_ERR FW_BUG PFX "No ACPI _PSS objects for " "CPU other than CPU0. Complain to your BIOS " "vendor.\n"); - goto err_out; + kfree(data); + return -ENODEV; } rc = find_psb_table(data); if (rc) { - goto err_out; + kfree(data); + return -ENODEV; } /* Take a crude guess here. * That guess was in microseconds, so multiply with 1000 */ -- cgit v1.1 From 1371be0f7c8f6141b2dbfde6a7ae7885bedb9834 Mon Sep 17 00:00:00 2001 From: Rusty Russell Date: Mon, 16 Feb 2009 17:31:59 -0600 Subject: cpumask: Use cpu_*_mask accessors code: alpha Impact: use new API, fix SMP bug. Use the new accessors rather than frobbing bits directly. This also removes the bug introduced in ee0c468b (alpha: compile fixes) which had Alpha setting bits on an on-stack cpumask, not the cpu_online_map. Cc: Richard Henderson Cc: FUJITA Tomonori Signed-off-by: Rusty Russell Signed-off-by: Mike Travis Acked-by: Ivan Kokshaysky Acked-by: Ingo Molnar --- arch/alpha/kernel/process.c | 8 ++++---- arch/alpha/kernel/smp.c | 12 ++++++------ 2 files changed, 10 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index f238370..8d0097f 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -93,8 +93,8 @@ common_shutdown_1(void *generic_ptr) if (cpuid != boot_cpuid) { flags |= 0x00040000UL; /* "remain halted" */ *pflags = flags; - cpu_clear(cpuid, cpu_present_map); - cpu_clear(cpuid, cpu_possible_map); + set_cpu_present(cpuid, false); + set_cpu_possible(cpuid, false); halt(); } #endif @@ -120,8 +120,8 @@ common_shutdown_1(void *generic_ptr) #ifdef CONFIG_SMP /* Wait for the secondaries to halt. */ - cpu_clear(boot_cpuid, cpu_present_map); - cpu_clear(boot_cpuid, cpu_possible_map); + set_cpu_present(boot_cpuid, false); + set_cpu_possible(boot_cpuid, false); while (cpus_weight(cpu_present_map)) barrier(); #endif diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c index 00f1dc3..b1fe567 100644 --- a/arch/alpha/kernel/smp.c +++ b/arch/alpha/kernel/smp.c @@ -120,12 +120,12 @@ void __cpuinit smp_callin(void) { int cpuid = hard_smp_processor_id(); - cpumask_t mask = cpu_online_map; - if (cpu_test_and_set(cpuid, mask)) { + if (cpu_online(cpuid)) { printk("??, cpu 0x%x already present??\n", cpuid); BUG(); } + set_cpu_online(cpuid, true); /* Turn on machine checks. */ wrmces(7); @@ -436,8 +436,8 @@ setup_smp(void) ((char *)cpubase + i*hwrpb->processor_size); if ((cpu->flags & 0x1cc) == 0x1cc) { smp_num_probed++; - cpu_set(i, cpu_possible_map); - cpu_set(i, cpu_present_map); + set_cpu_possible(i, true); + set_cpu_present(i, true); cpu->pal_revision = boot_cpu_palrev; } @@ -470,8 +470,8 @@ smp_prepare_cpus(unsigned int max_cpus) /* Nothing to do on a UP box, or when told not to. */ if (smp_num_probed == 1 || max_cpus == 0) { - cpu_possible_map = cpumask_of_cpu(boot_cpuid); - cpu_present_map = cpumask_of_cpu(boot_cpuid); + init_cpu_possible(cpumask_of(boot_cpuid)); + init_cpu_present(cpumask_of(boot_cpuid)); printk(KERN_INFO "SMP mode deactivated.\n"); return; } -- cgit v1.1 From 744f6592727a7ab9e3ca4266bedaa786825a31bb Mon Sep 17 00:00:00 2001 From: Gregory CLEMENT Date: Mon, 16 Feb 2009 21:21:47 +0100 Subject: [ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller Add support for inverted rdy_busy pin for Atmel nand device controller It will fix building error on NeoCore926 board. Acked-by: Andrew Victor Acked-by: David Woodhouse Signed-off-by: Gregory CLEMENT Signed-off-by: Russell King --- arch/arm/mach-at91/include/mach/board.h | 1 + arch/avr32/mach-at32ap/include/mach/board.h | 1 + 2 files changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/arm/mach-at91/include/mach/board.h b/arch/arm/mach-at91/include/mach/board.h index fb51f0e..0b3ae21 100644 --- a/arch/arm/mach-at91/include/mach/board.h +++ b/arch/arm/mach-at91/include/mach/board.h @@ -93,6 +93,7 @@ struct atmel_nand_data { u8 enable_pin; /* chip enable */ u8 det_pin; /* card detect */ u8 rdy_pin; /* ready/busy */ + u8 rdy_pin_active_low; /* rdy_pin value is inverted */ u8 ale; /* address line number connected to ALE */ u8 cle; /* address line number connected to CLE */ u8 bus_width_16; /* buswidth is 16 bit */ diff --git a/arch/avr32/mach-at32ap/include/mach/board.h b/arch/avr32/mach-at32ap/include/mach/board.h index aafaf7a..cff8e84f 100644 --- a/arch/avr32/mach-at32ap/include/mach/board.h +++ b/arch/avr32/mach-at32ap/include/mach/board.h @@ -116,6 +116,7 @@ struct atmel_nand_data { int enable_pin; /* chip enable */ int det_pin; /* card detect */ int rdy_pin; /* ready/busy */ + u8 rdy_pin_active_low; /* rdy_pin value is inverted */ u8 ale; /* address line number connected to ALE */ u8 cle; /* address line number connected to CLE */ u8 bus_width_16; /* buswidth is 16 bit */ -- cgit v1.1 From bf51935f3e988e0ed6f34b55593e5912f990750a Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 17 Feb 2009 06:01:30 -0800 Subject: x86, rcu: fix strange load average and ksoftirqd behavior Damien Wyart reported high ksoftirqd CPU usage (20%) on an otherwise idle system. The function-graph trace Damien provided: > 799.521187 | 1) -0 | | rcu_check_callbacks() { > 799.521371 | 1) -0 | | rcu_check_callbacks() { > 799.521555 | 1) -0 | | rcu_check_callbacks() { > 799.521738 | 1) -0 | | rcu_check_callbacks() { > 799.521934 | 1) -0 | | rcu_check_callbacks() { > 799.522068 | 1) ksoftir-2324 | | rcu_check_callbacks() { > 799.522208 | 1) -0 | | rcu_check_callbacks() { > 799.522392 | 1) -0 | | rcu_check_callbacks() { > 799.522575 | 1) -0 | | rcu_check_callbacks() { > 799.522759 | 1) -0 | | rcu_check_callbacks() { > 799.522956 | 1) -0 | | rcu_check_callbacks() { > 799.523074 | 1) ksoftir-2324 | | rcu_check_callbacks() { > 799.523214 | 1) -0 | | rcu_check_callbacks() { > 799.523397 | 1) -0 | | rcu_check_callbacks() { > 799.523579 | 1) -0 | | rcu_check_callbacks() { > 799.523762 | 1) -0 | | rcu_check_callbacks() { > 799.523960 | 1) -0 | | rcu_check_callbacks() { > 799.524079 | 1) ksoftir-2324 | | rcu_check_callbacks() { > 799.524220 | 1) -0 | | rcu_check_callbacks() { > 799.524403 | 1) -0 | | rcu_check_callbacks() { > 799.524587 | 1) -0 | | rcu_check_callbacks() { > 799.524770 | 1) -0 | | rcu_check_callbacks() { > [ . . . ] Shows rcu_check_callbacks() being invoked way too often. It should be called once per jiffy, and here it is called no less than 22 times in about 3.5 milliseconds, meaning one call every 160 microseconds or so. Why do we need to call rcu_pending() and rcu_check_callbacks() from the idle loop of 32-bit x86, especially given that no other architecture does this? The following patch removes the call to rcu_pending() and rcu_check_callbacks() from the x86 32-bit idle loop in order to reduce the softirq load on idle systems. Reported-by: Damien Wyart Signed-off-by: Paul E. McKenney Signed-off-by: Ingo Molnar --- arch/x86/kernel/process_32.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index a546f55..bd4da2a 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -104,9 +104,6 @@ void cpu_idle(void) check_pgt_cache(); rmb(); - if (rcu_pending(cpu)) - rcu_check_callbacks(cpu, 0); - if (cpu_is_offline(cpu)) play_dead(); -- cgit v1.1 From fd4b9b3650076ffadbdd6e360eb198f5d61747c0 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 17 Feb 2009 20:45:50 +0100 Subject: [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support The GPIO interrupts can be configured as either level triggered or edge triggered, with a default of level triggered. When an edge triggered interrupt is requested, the gpio_irq_set_type method is called which currently switches the given IRQ descriptor between two struct irq_chip instances: orion_gpio_irq_level_chip and orion_gpio_irq_edge_chip. This happens via __setup_irq() which also calls irq_chip_set_defaults() to assign default methods to uninitialized ones. The problem is that irq_chip_set_defaults() is called before the irq_chip reference is switched, leaving the new irq_chip (orion_gpio_irq_edge_chip in this case) with uninitialized methods such as chip->startup() causing a kernel oops. Many solutions are possible, such as making irq_chip_set_defaults() global and calling it from gpio_irq_set_type(), or calling __irq_set_trigger() before irq_chip_set_defaults() in __setup_irq(). But those require modifications to the generic IRQ code which might have adverse effect on other architectures, and that would still be a fragile arrangement. Manually copying the missing methods from within gpio_irq_set_type() would be really ugly and it would break again the day new methods with automatic defaults are added. A better solution is to have a single irq_chip instance which can deal with both edge and level triggered interrupts. It is also a good idea to switch the IRQ handler instead, as the edge IRQ handler allows for one edge IRQ event to be queued as the IRQ is actually masked only when that second IRQ is received, at which point the hardware can queue an additional IRQ event, making edge triggered interrupts a bit more reliable. Tested-by: Martin Michlmayr Signed-off-by: Nicolas Pitre Signed-off-by: Russell King --- arch/arm/mach-kirkwood/irq.c | 2 +- arch/arm/mach-mv78xx0/irq.c | 2 +- arch/arm/mach-orion5x/irq.c | 2 +- arch/arm/plat-orion/gpio.c | 73 +++++++++++---------------------- arch/arm/plat-orion/include/plat/gpio.h | 3 +- 5 files changed, 29 insertions(+), 53 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-kirkwood/irq.c b/arch/arm/mach-kirkwood/irq.c index efb86b7..06083b2 100644 --- a/arch/arm/mach-kirkwood/irq.c +++ b/arch/arm/mach-kirkwood/irq.c @@ -42,7 +42,7 @@ void __init kirkwood_init_irq(void) writel(0, GPIO_EDGE_CAUSE(32)); for (i = IRQ_KIRKWOOD_GPIO_START; i < NR_IRQS; i++) { - set_irq_chip(i, &orion_gpio_irq_level_chip); + set_irq_chip(i, &orion_gpio_irq_chip); set_irq_handler(i, handle_level_irq); irq_desc[i].status |= IRQ_LEVEL; set_irq_flags(i, IRQF_VALID); diff --git a/arch/arm/mach-mv78xx0/irq.c b/arch/arm/mach-mv78xx0/irq.c index e273418..30b7e4b 100644 --- a/arch/arm/mach-mv78xx0/irq.c +++ b/arch/arm/mach-mv78xx0/irq.c @@ -40,7 +40,7 @@ void __init mv78xx0_init_irq(void) writel(0, GPIO_EDGE_CAUSE(0)); for (i = IRQ_MV78XX0_GPIO_START; i < NR_IRQS; i++) { - set_irq_chip(i, &orion_gpio_irq_level_chip); + set_irq_chip(i, &orion_gpio_irq_chip); set_irq_handler(i, handle_level_irq); irq_desc[i].status |= IRQ_LEVEL; set_irq_flags(i, IRQF_VALID); diff --git a/arch/arm/mach-orion5x/irq.c b/arch/arm/mach-orion5x/irq.c index 0caae43..e03f7b4 100644 --- a/arch/arm/mach-orion5x/irq.c +++ b/arch/arm/mach-orion5x/irq.c @@ -44,7 +44,7 @@ void __init orion5x_init_irq(void) * User can use set_type() if he wants to use edge types handlers. */ for (i = IRQ_ORION5X_GPIO_START; i < NR_IRQS; i++) { - set_irq_chip(i, &orion_gpio_irq_level_chip); + set_irq_chip(i, &orion_gpio_irq_chip); set_irq_handler(i, handle_level_irq); irq_desc[i].status |= IRQ_LEVEL; set_irq_flags(i, IRQF_VALID); diff --git a/arch/arm/plat-orion/gpio.c b/arch/arm/plat-orion/gpio.c index 9671864..0d12c21 100644 --- a/arch/arm/plat-orion/gpio.c +++ b/arch/arm/plat-orion/gpio.c @@ -265,51 +265,36 @@ EXPORT_SYMBOL(orion_gpio_set_blink); * polarity LEVEL mask * ****************************************************************************/ -static void gpio_irq_edge_ack(u32 irq) -{ - int pin = irq_to_gpio(irq); - - writel(~(1 << (pin & 31)), GPIO_EDGE_CAUSE(pin)); -} - -static void gpio_irq_edge_mask(u32 irq) -{ - int pin = irq_to_gpio(irq); - u32 u; - - u = readl(GPIO_EDGE_MASK(pin)); - u &= ~(1 << (pin & 31)); - writel(u, GPIO_EDGE_MASK(pin)); -} -static void gpio_irq_edge_unmask(u32 irq) +static void gpio_irq_ack(u32 irq) { - int pin = irq_to_gpio(irq); - u32 u; - - u = readl(GPIO_EDGE_MASK(pin)); - u |= 1 << (pin & 31); - writel(u, GPIO_EDGE_MASK(pin)); + int type = irq_desc[irq].status & IRQ_TYPE_SENSE_MASK; + if (type & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING)) { + int pin = irq_to_gpio(irq); + writel(~(1 << (pin & 31)), GPIO_EDGE_CAUSE(pin)); + } } -static void gpio_irq_level_mask(u32 irq) +static void gpio_irq_mask(u32 irq) { int pin = irq_to_gpio(irq); - u32 u; - - u = readl(GPIO_LEVEL_MASK(pin)); + int type = irq_desc[irq].status & IRQ_TYPE_SENSE_MASK; + u32 reg = (type & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING)) ? + GPIO_EDGE_MASK(pin) : GPIO_LEVEL_MASK(pin); + u32 u = readl(reg); u &= ~(1 << (pin & 31)); - writel(u, GPIO_LEVEL_MASK(pin)); + writel(u, reg); } -static void gpio_irq_level_unmask(u32 irq) +static void gpio_irq_unmask(u32 irq) { int pin = irq_to_gpio(irq); - u32 u; - - u = readl(GPIO_LEVEL_MASK(pin)); + int type = irq_desc[irq].status & IRQ_TYPE_SENSE_MASK; + u32 reg = (type & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING)) ? + GPIO_EDGE_MASK(pin) : GPIO_LEVEL_MASK(pin); + u32 u = readl(reg); u |= 1 << (pin & 31); - writel(u, GPIO_LEVEL_MASK(pin)); + writel(u, reg); } static int gpio_irq_set_type(u32 irq, u32 type) @@ -331,9 +316,9 @@ static int gpio_irq_set_type(u32 irq, u32 type) * Set edge/level type. */ if (type & (IRQ_TYPE_EDGE_RISING | IRQ_TYPE_EDGE_FALLING)) { - desc->chip = &orion_gpio_irq_edge_chip; + desc->handle_irq = handle_edge_irq; } else if (type & (IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW)) { - desc->chip = &orion_gpio_irq_level_chip; + desc->handle_irq = handle_level_irq; } else { printk(KERN_ERR "failed to set irq=%d (type=%d)\n", irq, type); return -EINVAL; @@ -371,19 +356,11 @@ static int gpio_irq_set_type(u32 irq, u32 type) return 0; } -struct irq_chip orion_gpio_irq_edge_chip = { - .name = "orion_gpio_irq_edge", - .ack = gpio_irq_edge_ack, - .mask = gpio_irq_edge_mask, - .unmask = gpio_irq_edge_unmask, - .set_type = gpio_irq_set_type, -}; - -struct irq_chip orion_gpio_irq_level_chip = { - .name = "orion_gpio_irq_level", - .mask = gpio_irq_level_mask, - .mask_ack = gpio_irq_level_mask, - .unmask = gpio_irq_level_unmask, +struct irq_chip orion_gpio_irq_chip = { + .name = "orion_gpio", + .ack = gpio_irq_ack, + .mask = gpio_irq_mask, + .unmask = gpio_irq_unmask, .set_type = gpio_irq_set_type, }; diff --git a/arch/arm/plat-orion/include/plat/gpio.h b/arch/arm/plat-orion/include/plat/gpio.h index 54deaf2..ec743e8 100644 --- a/arch/arm/plat-orion/include/plat/gpio.h +++ b/arch/arm/plat-orion/include/plat/gpio.h @@ -31,8 +31,7 @@ void orion_gpio_set_blink(unsigned pin, int blink); /* * GPIO interrupt handling. */ -extern struct irq_chip orion_gpio_irq_edge_chip; -extern struct irq_chip orion_gpio_irq_level_chip; +extern struct irq_chip orion_gpio_irq_chip; void orion_gpio_irq_handler(int irqoff); -- cgit v1.1 From 6ec68bff3c81e776a455f6aca95c8c5f1d630198 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Thu, 12 Feb 2009 13:39:26 +0100 Subject: x86, mce: reinitialize per cpu features on resume Impact: Bug fix This fixes a long standing bug in the machine check code. On resume the boot CPU wouldn't get its vendor specific state like thermal handling reinitialized. This means the boot cpu wouldn't ever get any thermal events reported again. Call the respective initialization functions on resume v2: Remove ancient init because they don't have a resume device anyways. Pointed out by Thomas Gleixner. v3: Now fix the Subject too to reflect v2 change Signed-off-by: Andi Kleen Acked-by: Thomas Gleixner Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/mcheck/mce_64.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c index 1c83803..1f184ef 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_64.c +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -734,6 +734,7 @@ __setup("mce=", mcheck_enable); static int mce_resume(struct sys_device *dev) { mce_init(NULL); + mce_cpu_features(¤t_cpu_data); return 0; } -- cgit v1.1 From 380851bc6b1b4107c61dfa2997f9095dcf779336 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Thu, 12 Feb 2009 13:39:33 +0100 Subject: x86, mce: use force_sig_info to kill process in machine check Impact: bug fix (with tolerant == 3) do_exit cannot be called directly from the exception handler because it can sleep and the exception handler runs on the exception stack. Use force_sig() instead. Based on a earlier patch by Ying Huang who debugged the problem. Signed-off-by: Andi Kleen Acked-by: Thomas Gleixner Signed-off-by: H. Peter Anvin --- arch/x86/kernel/cpu/mcheck/mce_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c index 1f184ef..25cf624 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_64.c +++ b/arch/x86/kernel/cpu/mcheck/mce_64.c @@ -295,11 +295,11 @@ void do_machine_check(struct pt_regs * regs, long error_code) * If we know that the error was in user space, send a * SIGBUS. Otherwise, panic if tolerance is low. * - * do_exit() takes an awful lot of locks and has a slight + * force_sig() takes an awful lot of locks and has a slight * risk of deadlocking. */ if (user_space) { - do_exit(SIGBUS); + force_sig(SIGBUS, current); } else if (panic_on_oops || tolerant < 2) { mce_panic("Uncorrected machine check", &panicm, mcestart); -- cgit v1.1 From 07db1c140eb233971341396e492cc73d4280e698 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Thu, 12 Feb 2009 13:39:35 +0100 Subject: x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown Impact: Bugfix The ifdef for the apic clear on shutdown for the 64bit intel thermal vector was incorrect and never triggered. Fix that. Signed-off-by: Andi Kleen Acked-by: Thomas Gleixner Signed-off-by: H. Peter Anvin --- arch/x86/kernel/apic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic.c b/arch/x86/kernel/apic.c index 115449f..570f36e 100644 --- a/arch/x86/kernel/apic.c +++ b/arch/x86/kernel/apic.c @@ -862,7 +862,7 @@ void clear_local_APIC(void) } /* lets not touch this if we didn't frob it */ -#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(X86_MCE_INTEL) +#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL) if (maxlvt >= 5) { v = apic_read(APIC_LVTTHMR); apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); -- cgit v1.1 From f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Wed, 18 Feb 2009 14:48:32 -0800 Subject: mm: clean up for early_pfn_to_nid() What's happening is that the assertion in mm/page_alloc.c:move_freepages() is triggering: BUG_ON(page_zone(start_page) != page_zone(end_page)); Once I knew this is what was happening, I added some annotations: if (unlikely(page_zone(start_page) != page_zone(end_page))) { printk(KERN_ERR "move_freepages: Bogus zones: " "start_page[%p] end_page[%p] zone[%p]\n", start_page, end_page, zone); printk(KERN_ERR "move_freepages: " "start_zone[%p] end_zone[%p]\n", page_zone(start_page), page_zone(end_page)); printk(KERN_ERR "move_freepages: " "start_pfn[0x%lx] end_pfn[0x%lx]\n", page_to_pfn(start_page), page_to_pfn(end_page)); printk(KERN_ERR "move_freepages: " "start_nid[%d] end_nid[%d]\n", page_to_nid(start_page), page_to_nid(end_page)); ... And here's what I got: move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00] move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00] move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff] move_freepages: start_nid[1] end_nid[0] My memory layout on this box is: [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0081ff5d [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[8] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x00020000 [ 0.000000] 1: 0x00800000 -> 0x0081f7ff [ 0.000000] 1: 0x0081f800 -> 0x0081fe50 [ 0.000000] 1: 0x0081fed1 -> 0x0081fed8 [ 0.000000] 1: 0x0081feda -> 0x0081fedb [ 0.000000] 1: 0x0081fedd -> 0x0081fee5 [ 0.000000] 1: 0x0081fee7 -> 0x0081ff51 [ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d So it's a block move in that 0x81f600-->0x81f7ff region which triggers the problem. This patch: Declaration of early_pfn_to_nid() is scattered over per-arch include files, and it seems it's complicated to know when the declaration is used. I think it makes fix-for-memmap-init not easy. This patch moves all declaration to include/linux/mm.h After this, if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use static definition in include/linux/mm.h else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use generic definition in mm/page_alloc.c else -> per-arch back end function will be called. Signed-off-by: KAMEZAWA Hiroyuki Tested-by: KOSAKI Motohiro Reported-by: David Miller Cc: Mel Gorman Cc: Heiko Carstens Cc: [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ia64/include/asm/mmzone.h | 4 ---- arch/ia64/mm/numa.c | 2 +- arch/x86/include/asm/mmzone_32.h | 2 -- arch/x86/include/asm/mmzone_64.h | 2 -- arch/x86/mm/numa_64.c | 2 +- 5 files changed, 2 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/ia64/include/asm/mmzone.h b/arch/ia64/include/asm/mmzone.h index 34efe88..f2ca320 100644 --- a/arch/ia64/include/asm/mmzone.h +++ b/arch/ia64/include/asm/mmzone.h @@ -31,10 +31,6 @@ static inline int pfn_to_nid(unsigned long pfn) #endif } -#ifdef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -extern int early_pfn_to_nid(unsigned long pfn); -#endif - #ifdef CONFIG_IA64_DIG /* DIG systems are small */ # define MAX_PHYSNODE_ID 8 # define NR_NODE_MEMBLKS (MAX_NUMNODES * 8) diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c index b73bf18..5061c3f 100644 --- a/arch/ia64/mm/numa.c +++ b/arch/ia64/mm/numa.c @@ -58,7 +58,7 @@ paddr_to_nid(unsigned long paddr) * SPARSEMEM to allocate the SPARSEMEM sectionmap on the NUMA node where * the section resides. */ -int early_pfn_to_nid(unsigned long pfn) +int __meminit __early_pfn_to_nid(unsigned long pfn) { int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec; diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h index 07f1af4..105fb90 100644 --- a/arch/x86/include/asm/mmzone_32.h +++ b/arch/x86/include/asm/mmzone_32.h @@ -32,8 +32,6 @@ static inline void get_memcfg_numa(void) get_memcfg_numa_flat(); } -extern int early_pfn_to_nid(unsigned long pfn); - extern void resume_map_numa_kva(pgd_t *pgd); #else /* !CONFIG_NUMA */ diff --git a/arch/x86/include/asm/mmzone_64.h b/arch/x86/include/asm/mmzone_64.h index a5b3817..a29f48c 100644 --- a/arch/x86/include/asm/mmzone_64.h +++ b/arch/x86/include/asm/mmzone_64.h @@ -40,8 +40,6 @@ static inline __attribute__((pure)) int phys_to_nid(unsigned long addr) #define node_end_pfn(nid) (NODE_DATA(nid)->node_start_pfn + \ NODE_DATA(nid)->node_spanned_pages) -extern int early_pfn_to_nid(unsigned long pfn); - #ifdef CONFIG_NUMA_EMU #define FAKE_NODE_MIN_SIZE (64 * 1024 * 1024) #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL)) diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c index 71a14f8..f3516da 100644 --- a/arch/x86/mm/numa_64.c +++ b/arch/x86/mm/numa_64.c @@ -145,7 +145,7 @@ int __init compute_hash_shift(struct bootnode *nodes, int numnodes, return shift; } -int early_pfn_to_nid(unsigned long pfn) +int __meminit __early_pfn_to_nid(unsigned long pfn) { return phys_to_nid(pfn << PAGE_SHIFT); } -- cgit v1.1 From cc2559bccc72767cb446f79b071d96c30c26439b Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Wed, 18 Feb 2009 14:48:33 -0800 Subject: mm: fix memmap init for handling memory hole Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole. and memmap initialization was not done. This was a trouble for sparc boot. To fix this, the PFN should be initialized and marked as PG_reserved. This patch changes early_pfn_in_nid() return true if PFN is a hole. Signed-off-by: KAMEZAWA Hiroyuki Reported-by: David Miller Tested-by: KOSAKI Motohiro Cc: Mel Gorman Cc: Heiko Carstens Cc: [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ia64/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/ia64/mm/numa.c b/arch/ia64/mm/numa.c index 5061c3f..3efea7d 100644 --- a/arch/ia64/mm/numa.c +++ b/arch/ia64/mm/numa.c @@ -70,7 +70,7 @@ int __meminit __early_pfn_to_nid(unsigned long pfn) return node_memblk[i].nid; } - return 0; + return -1; } #ifdef CONFIG_MEMORY_HOTPLUG -- cgit v1.1 From 3fd9825c42c784a59b3b90bdf073f49d4bb42a8d Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 18 Feb 2009 22:29:22 +0100 Subject: [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo() In the non highmem case, if two memory banks of 1GB each are provided, the second bank would evade suppression since its virtual base would be 0. Fix this by disallowing any memory bank which virtual base address is found to be lower than PAGE_OFFSET. Reported-by: Lennert Buytenhek Signed-off-by: Nicolas Pitre Signed-off-by: Russell King --- arch/arm/mm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 9b36c5c..d4d082c 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -693,7 +693,8 @@ static void __init sanity_check_meminfo(void) * Check whether this memory bank would entirely overlap * the vmalloc area. */ - if (__va(bank->start) >= VMALLOC_MIN) { + if (__va(bank->start) >= VMALLOC_MIN || + __va(bank->start) < PAGE_OFFSET) { printk(KERN_NOTICE "Ignoring RAM at %.8lx-%.8lx " "(vmalloc region overlap).\n", bank->start, bank->start + bank->size - 1); -- cgit v1.1 From 41f3103fcfffff096c34f5267d7c9a26b44d89d3 Mon Sep 17 00:00:00 2001 From: Russell King Date: Thu, 19 Feb 2009 13:25:16 +0000 Subject: [ARM] omap: fix clock reparenting in omap2_clk_set_parent() When changing the parent of a clock, it is necessary to keep the clock use counts balanced otherwise things the parent state will get corrupted. Since we already disable and re-enable the clock, we might as well use the recursive versions instead. Signed-off-by: Russell King --- arch/arm/mach-omap2/clock.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c index e64549e..ce4d46a 100644 --- a/arch/arm/mach-omap2/clock.c +++ b/arch/arm/mach-omap2/clock.c @@ -746,7 +746,7 @@ int omap2_clk_set_parent(struct clk *clk, struct clk *new_parent) return -EINVAL; if (clk->usecount > 0) - _omap2_clk_disable(clk); + omap2_clk_disable(clk); /* Set new source value (previous dividers if any in effect) */ reg_val = __raw_readl(src_addr) & ~field_mask; @@ -759,11 +759,11 @@ int omap2_clk_set_parent(struct clk *clk, struct clk *new_parent) wmb(); } - if (clk->usecount > 0) - _omap2_clk_enable(clk); - clk->parent = new_parent; + if (clk->usecount > 0) + omap2_clk_enable(clk); + /* CLKSEL clocks follow their parents' rates, divided by a divisor */ clk->rate = new_parent->rate; -- cgit v1.1 From d5cd0343d2878b66e25e044f644563c6bf708833 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Thu, 19 Feb 2009 15:19:00 +0100 Subject: [S390] Fix timeval regression on s390 commit aa5e97ce4bbc9d5daeec16b1d15bb3f6b7b4f4d4 [PATCH] improve precision of process accounting. Introduced a timing regression: -bash-3.2# time ls real 0m0.006s user 0m1.754s sys 0m1.094s The problem was introduced by an error in cputime_to_timeval. Cputime is now 1/4096 microsecond, therefore, we have to divide the remainder with 4096 to get the microseconds. Signed-off-by: Christian Borntraeger Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/cputime.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h index 5217264..95b0f7d 100644 --- a/arch/s390/include/asm/cputime.h +++ b/arch/s390/include/asm/cputime.h @@ -145,7 +145,7 @@ cputime_to_timeval(const cputime_t cputime, struct timeval *value) value->tv_usec = rp.subreg.even / 4096; value->tv_sec = rp.subreg.odd; #else - value->tv_usec = cputime % 4096000000ULL; + value->tv_usec = (cputime % 4096000000ULL) / 4096; value->tv_sec = cputime / 4096000000ULL; #endif } -- cgit v1.1 From 23d75d9cadd79bc9fd6553857d57c679cf18d4cb Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Thu, 19 Feb 2009 15:19:01 +0100 Subject: [S390] fix "mem=" handling in case of standby memory Standby memory detected with the sclp interface gets always registered with add_memory calls without considering the limitationt that the "mem=" kernel paramater implies. So fix this and only register standby memory that is below the specified limit. This fixes zfcpdump since it uses "mem=32M". In case there is appr. 2GB standby memory present all of usable memory would be used for the struct pages needed for standby memory. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky --- arch/s390/include/asm/setup.h | 2 ++ arch/s390/kernel/setup.c | 9 +++++++-- 2 files changed, 9 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/include/asm/setup.h b/arch/s390/include/asm/setup.h index 2bd9fae..e8bd6ac 100644 --- a/arch/s390/include/asm/setup.h +++ b/arch/s390/include/asm/setup.h @@ -43,6 +43,8 @@ struct mem_chunk { extern struct mem_chunk memory_chunk[]; extern unsigned long real_memory_size; +extern int memory_end_set; +extern unsigned long memory_end; void detect_memory_layout(struct mem_chunk chunk[]); diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index d825f49..c5cfb61 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -82,7 +82,9 @@ char elf_platform[ELF_PLATFORM_SIZE]; struct mem_chunk __initdata memory_chunk[MEMORY_CHUNKS]; volatile int __cpu_logical_map[NR_CPUS]; /* logical cpu to cpu address */ -static unsigned long __initdata memory_end; + +int __initdata memory_end_set; +unsigned long __initdata memory_end; /* * This is set up by the setup-routine at boot-time @@ -281,6 +283,7 @@ void (*pm_power_off)(void) = machine_power_off; static int __init early_parse_mem(char *p) { memory_end = memparse(p, &p); + memory_end_set = 1; return 0; } early_param("mem", early_parse_mem); @@ -508,8 +511,10 @@ static void __init setup_memory_end(void) int i; #if defined(CONFIG_ZFCPDUMP) || defined(CONFIG_ZFCPDUMP_MODULE) - if (ipl_info.type == IPL_TYPE_FCP_DUMP) + if (ipl_info.type == IPL_TYPE_FCP_DUMP) { memory_end = ZFCPDUMP_HSA_SIZE; + memory_end_set = 1; + } #endif memory_size = 0; memory_end &= PAGE_MASK; -- cgit v1.1 From 9da616fb9946c8d65387052e5a538b8f54ddb292 Mon Sep 17 00:00:00 2001 From: Makito SHIOKAWA Date: Thu, 19 Feb 2009 15:34:59 +0100 Subject: [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC READ_IMPLIES_EXEC must be set when: o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X) o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6) Signed-off-by: Makito SHIOKAWA Signed-off-by: Russell King --- arch/arm/kernel/elf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/kernel/elf.c b/arch/arm/kernel/elf.c index 8484909..d4a0da1 100644 --- a/arch/arm/kernel/elf.c +++ b/arch/arm/kernel/elf.c @@ -74,9 +74,9 @@ EXPORT_SYMBOL(elf_set_personality); */ int arm_elf_read_implies_exec(const struct elf32_hdr *x, int executable_stack) { - if (executable_stack != EXSTACK_ENABLE_X) + if (executable_stack != EXSTACK_DISABLE_X) return 1; - if (cpu_architecture() <= CPU_ARCH_ARMv6) + if (cpu_architecture() < CPU_ARCH_ARMv6) return 1; return 0; } -- cgit v1.1 From 9dd446f657ebebb209274878be5d01103fcfe988 Mon Sep 17 00:00:00 2001 From: Hartley Sweeten Date: Thu, 19 Feb 2009 17:09:04 +0100 Subject: [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header Remove the gesbc9312.h header since it is unused. Signed-off-by: H Hartley Sweeten Signed-off-by: Russell King --- arch/arm/mach-ep93xx/include/mach/gesbc9312.h | 3 --- arch/arm/mach-ep93xx/include/mach/hardware.h | 1 - 2 files changed, 4 deletions(-) delete mode 100644 arch/arm/mach-ep93xx/include/mach/gesbc9312.h (limited to 'arch') diff --git a/arch/arm/mach-ep93xx/include/mach/gesbc9312.h b/arch/arm/mach-ep93xx/include/mach/gesbc9312.h deleted file mode 100644 index 21fe2b9..0000000 --- a/arch/arm/mach-ep93xx/include/mach/gesbc9312.h +++ /dev/null @@ -1,3 +0,0 @@ -/* - * arch/arm/mach-ep93xx/include/mach/gesbc9312.h - */ diff --git a/arch/arm/mach-ep93xx/include/mach/hardware.h b/arch/arm/mach-ep93xx/include/mach/hardware.h index 529807d..2866297 100644 --- a/arch/arm/mach-ep93xx/include/mach/hardware.h +++ b/arch/arm/mach-ep93xx/include/mach/hardware.h @@ -10,7 +10,6 @@ #include "platform.h" -#include "gesbc9312.h" #include "ts72xx.h" #endif -- cgit v1.1 From 48ffc70b675aa7798a52a2e92e20f6cce9140b3d Mon Sep 17 00:00:00 2001 From: Alok N Kataria Date: Wed, 18 Feb 2009 12:33:55 -0800 Subject: x86, vmi: TSC going backwards check in vmi clocksource Impact: fix time warps under vmware Similar to the check for TSC going backwards in the TSC clocksource, we also need this check for VMI clocksource. Signed-off-by: Alok N Kataria Cc: Zachary Amsden Signed-off-by: Ingo Molnar Cc: stable@kernel.org --- arch/x86/kernel/vmiclock_32.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/vmiclock_32.c b/arch/x86/kernel/vmiclock_32.c index c4c1f9e..bde106c 100644 --- a/arch/x86/kernel/vmiclock_32.c +++ b/arch/x86/kernel/vmiclock_32.c @@ -283,10 +283,13 @@ void __devinit vmi_time_ap_init(void) #endif /** vmi clocksource */ +static struct clocksource clocksource_vmi; static cycle_t read_real_cycles(void) { - return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL); + cycle_t ret = (cycle_t)vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL); + return ret >= clocksource_vmi.cycle_last ? + ret : clocksource_vmi.cycle_last; } static struct clocksource clocksource_vmi = { -- cgit v1.1 From 07a66d7c53a538e1a9759954a82bb6c07365eff9 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 20 Feb 2009 08:04:13 +0100 Subject: x86: use the right protections for split-up pagetables Steven Rostedt found a bug in where in his modified kernel ftrace was unable to modify the kernel text, due to the PMD itself having been marked read-only as well in split_large_page(). The fix, suggested by Linus, is to not try to 'clone' the reference protection of a huge-page, but to use the standard (and permissive) page protection bits of KERNPG_TABLE. The 'cloning' makes sense for the ptes but it's a confused and incorrect concept at the page table level - because the pagetable entry is a set of all ptes and hence cannot 'clone' any single protection attribute - the ptes can be any mixture of protections. With the permissive KERNPG_TABLE, even if the pte protections get changed after this point (due to ftrace doing code-patching or other similar activities like kprobes), the resulting combined protections will still be correct and the pte's restrictive (or permissive) protections will control it. Also update the comment. This bug was there for a long time but has not caused visible problems before as it needs a rather large read-only area to trigger. Steve possibly hacked his kernel with some really large arrays or so. Anyway, the bug is definitely worth fixing. [ Huang Ying also experienced problems in this area when writing the EFI code, but the real bug in split_large_page() was not realized back then. ] Reported-by: Steven Rostedt Reported-by: Huang Ying Acked-by: Linus Torvalds Signed-off-by: Ingo Molnar --- arch/x86/mm/pageattr.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 8ca0d85..7be47d1a9 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -508,18 +508,13 @@ static int split_large_page(pte_t *kpte, unsigned long address) #endif /* - * Install the new, split up pagetable. Important details here: + * Install the new, split up pagetable. * - * On Intel the NX bit of all levels must be cleared to make a - * page executable. See section 4.13.2 of Intel 64 and IA-32 - * Architectures Software Developer's Manual). - * - * Mark the entry present. The current mapping might be - * set to not present, which we preserved above. + * We use the standard kernel pagetable protections for the new + * pagetable protections, the actual ptes set above control the + * primary protection behavior: */ - ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte))); - pgprot_val(ref_prot) |= _PAGE_PRESENT; - __set_pmd_pte(kpte, address, mk_pte(base, ref_prot)); + __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE))); base = NULL; out_unlock: -- cgit v1.1