summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2009-01-269-111/+57
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix inconsistent lock state on resume in hres_timers_resume time-sched.c: tick_nohz_update_jiffies should be static locking, hpet: annotate false positive warning kernel/fork.c: unused variable 'ret' itimers: remove the per-cpu-ish-ness
| * hrtimers: fix inconsistent lock state on resume in hres_timers_resumePeter Zijlstra2009-01-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey Borzenkov reported this lockdep assert: > [17854.688347] ================================= > [17854.688347] [ INFO: inconsistent lock state ] > [17854.688347] 2.6.29-rc2-1avb #1 > [17854.688347] --------------------------------- > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes: > [17854.688347] (&cpu_base->lock){++..}, at: [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] {in-hardirq-W} state was registered at: > [17854.688347] [<c01443cd>] __lock_acquire+0x79d/0x1930 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136e61>] hrtimer_run_queues+0x31/0x140 > [17854.688347] [<c0128d98>] run_local_timers+0x8/0x20 > [17854.688347] [<c0128dd3>] update_process_times+0x23/0x60 > [17854.688347] [<c013e274>] tick_periodic+0x24/0x80 > [17854.688347] [<c013e2e2>] tick_handle_periodic+0x12/0x70 > [17854.688347] [<c0104e24>] timer_interrupt+0x14/0x20 > [17854.688347] [<c01607b9>] handle_IRQ_event+0x29/0x60 > [17854.688347] [<c0161c59>] handle_level_irq+0x69/0xe0 > [17854.688347] [<ffffffff>] 0xffffffff > [17854.688347] irq event stamp: 55771 > [17854.688347] hardirqs last enabled at (55771): [<c0309125>] _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] hardirqs last disabled at (55770): [<c0309419>] _spin_lock_irqsave+0x19/0x80 > [17854.688347] softirqs last enabled at (54836): [<c0124f54>] __do_softirq+0xc4/0x110 > [17854.688347] softirqs last disabled at (54831): [<c01049ae>] do_softirq+0x8e/0xe0 > [17854.688347] > [17854.688347] other info that might help us debug this: > [17854.688347] 3 locks held by pm-suspend/18240: > [17854.688347] #0: (&buffer->mutex){--..}, at: [<c01dd4c5>] sysfs_write_file+0x25/0x100 > [17854.688347] #1: (pm_mutex){--..}, at: [<c015056f>] enter_state+0x4f/0x140 > [17854.688347] #2: (dpm_list_mtx){--..}, at: [<c027880f>] device_pm_lock+0xf/0x20 > [17854.688347] > [17854.688347] stack backtrace: > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1 > [17854.688347] Call Trace: > [17854.688347] [<c0306248>] ? printk+0x18/0x20 > [17854.688347] [<c0141fac>] print_usage_bug+0x16c/0x1d0 > [17854.688347] [<c0142bcf>] mark_lock+0x8bf/0xc90 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c01441b0>] __lock_acquire+0x580/0x1930 > [17854.688347] [<c030916d>] ? _spin_unlock+0x1d/0x20 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c013dd38>] ? clockevents_program_event+0x98/0x160 > [17854.688347] [<c0142fe8>] ? mark_held_locks+0x48/0x90 > [17854.688347] [<c0309125>] ? _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] [<c0143229>] ? trace_hardirqs_on_caller+0x139/0x190 > [17854.688347] [<c014328b>] ? trace_hardirqs_on+0xb/0x10 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c013711a>] hres_timers_resume+0xa/0x10 > [17854.688347] [<c013aa8e>] timekeeping_resume+0xee/0x150 > [17854.688347] [<c0273384>] __sysdev_resume+0x14/0x50 > [17854.688347] [<c0273407>] sysdev_resume+0x47/0x80 > [17854.688347] [<c02791ab>] device_power_up+0xb/0x20 > [17854.688347] [<c015043f>] suspend_devices_and_enter+0xcf/0x150 > [17854.688347] [<c0150c2f>] ? freeze_processes+0x3f/0x90 > [17854.688347] [<c0150614>] enter_state+0xf4/0x140 > [17854.688347] [<c01506dd>] state_store+0x7d/0xc0 > [17854.688347] [<c0150660>] ? state_store+0x0/0xc0 > [17854.688347] [<c0202da4>] kobj_attr_store+0x24/0x30 > [17854.688347] [<c01dd53c>] sysfs_write_file+0x9c/0x100 > [17854.688347] [<c019916c>] vfs_write+0x9c/0x160 > [17854.688347] [<c0103494>] ? restore_nocheck_notrace+0x0/0xe > [17854.688347] [<c01dd4a0>] ? sysfs_write_file+0x0/0x100 > [17854.688347] [<c01992ed>] sys_write+0x3d/0x70 > [17854.688347] [<c0103371>] sysenter_do_call+0x12/0x31 Andrey's analysis: > timekeeping_resume() is called via class ->resume > method; and according to comments in sysdev_resume() and > device_power_up(), they are called with interrupts disabled. > > Looking at suspend_enter, irqs *are* disabled at this point. > > So it actually looks like something (may be some driver) > unconditionally enabled irqs in resume path. Add a debug check to test this theory. If it triggers then it triggers because the resume code calls it with irqs enabled, which is a no-no not just for timekeeping_resume(), but also bad for a number of other resume handlers. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * time-sched.c: tick_nohz_update_jiffies should be staticJaswinder Singh Rajput2009-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | Impact: cleanup, reduce kernel size a bit, avoid sparse warning Fixes sparse warning: kernel/time/tick-sched.c:137:6: warning: symbol 'tick_nohz_update_jiffies' was not declared. Should it be static? Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * locking, hpet: annotate false positive warningPeter Zijlstra2009-01-122-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander Beregalov reported that this warning is caused by the HPET code: > hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > hpet0: 3 comparators, 64-bit 14.318180 MHz counter > ODEBUG: object is on stack, but not annotated > ------------[ cut here ]------------ > WARNING: at lib/debugobjects.c:251 __debug_object_init+0x2a4/0x352() > Bisected down to 26afe5f2fbf06ea0765aaa316640c4dd472310c0 > (x86: HPET_MSI Initialise per-cpu HPET timers) The commit is fine - but the on-stack workqueue entry needs annotation. Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * kernel/fork.c: unused variable 'ret'Steven Noonan2009-01-111-1/+0
| | | | | | | | | | | | | | Removed the unused variable. Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge commit 'v2.6.29-rc1' into timers/urgentIngo Molnar2009-01-112182-41157/+152947
| |\
| * | itimers: remove the per-cpu-ish-nessPeter Zijlstra2009-01-075-107/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Either we bounce once cacheline per cpu per tick, yielding n^2 bounces or we just bounce a single.. Also, using per-cpu allocations for the thread-groups complicates the per-cpu allocator in that its currently aimed to be a fixed sized allocator and the only possible extention to that would be vmap based, which is seriously constrained on 32 bit archs. So making the per-cpu memory requirement depend on the number of processes is an issue. Lastly, it didn't deal with cpu-hotplug, although admittedly that might be fixable. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-01-2630-128/+260
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) xen: unitialised return value in xenbus_write_transaction x86: fix section mismatch warning x86: unmask CPUID levels on Intel CPUs, fix x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn. x86: use standard PIT frequency xen: handle highmem pages correctly when shrinking a domain x86, mm: fix pte_free() xen: actually release memory when shrinking domain x86: unmask CPUID levels on Intel CPUs x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> x86: fix PTE corruption issue while mapping RAM using /dev/mem x86: mtrr fix debug boot parameter x86: fix page attribute corruption with cpa() Revert "x86: signal: change type of paramter for sys_rt_sigreturn()" x86: use early clobbers in usercopy*.c x86: remove kernel_physical_mapping_init() from init section fix: crash: IP: __bitmap_intersects+0x48/0x73 cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write work_on_cpu: Use our own workqueue. work_on_cpu: don't try to get_online_cpus() in work_on_cpu. ...
| * | | xen: unitialised return value in xenbus_write_transactionIan Campbell2009-01-261-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value of xenbus_write_transaction can be uninitialised in the success case leading to the userspace xenstore utilities failing. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix section mismatch warningRakib Mullick2009-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here function vmi_activate calls a init function activate_vmi , which causes the following section mismatch warnings: LD arch/x86/kernel/built-in.o WARNING: arch/x86/kernel/built-in.o(.text+0x13ba9): Section mismatch in reference from the function vmi_activate() to the function .init.text:vmi_time_init() The function vmi_activate() references the function __init vmi_time_init(). This is often because vmi_activate lacks a __init annotation or the annotation of vmi_time_init is wrong. WARNING: arch/x86/kernel/built-in.o(.text+0x13bd1): Section mismatch in reference from the function vmi_activate() to the function .devinit.text:vmi_time_bsp_init() The function vmi_activate() references the function __devinit vmi_time_bsp_init(). This is often because vmi_activate lacks a __devinit annotation or the annotation of vmi_time_bsp_init is wrong. WARNING: arch/x86/kernel/built-in.o(.text+0x13bdb): Section mismatch in reference from the function vmi_activate() to the function .devinit.text:vmi_time_ap_init() The function vmi_activate() references the function __devinit vmi_time_ap_init(). This is often because vmi_activate lacks a __devinit annotation or the annotation of vmi_time_ap_init is wrong. Fix it by marking vmi_activate() as __init too. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: unmask CPUID levels on Intel CPUs, fixIngo Molnar2009-01-261-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix boot hang on pre-model-15 Intel CPUs rdmsrl_safe() does not work in very early bootup code yet, because we dont have the pagefault handler installed yet so exception section does not get parsed. rdmsr_safe() will just crash and hang the bootup. So limit the MSR_IA32_MISC_ENABLE MSR read to those CPU types that support it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.Eric Anholt2009-01-261-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that gets UC behavior even in the presence of a WC MTRR covering the area in question. By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual behavior the caller wanted (WC if you can manage it, UC otherwise). This recovers the 40% performance improvement of using WC in the DRM to upload vertex data. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | x86: use standard PIT frequencyIngo Molnar2009-01-251-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the RDC and ELAN platforms use slighly different PIT clocks, resulting in a timex.h hack that changes PIT_TICK_RATE during build time. But if a tester enables any of these platform support .config options, the PIT will be miscalibrated on standard PC platforms. So use one frequency - in a subsequent patch we'll add a quirk to allow x86 platforms to define different PIT frequencies. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | xen: handle highmem pages correctly when shrinking a domainIan Campbell2009-01-231-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1058a75f07b9bb8323fb5197be5526220f8b75cf ("xen: actually release memory when shrinking domain") causes a crash if the page being released is a highmem page. If a page is highmem then there is no need to unmap it. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86, mm: fix pte_free()Peter Zijlstra2009-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On -rt we were seeing spurious bad page states like: Bad page state in process 'firefox' page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0 Trying to fix it up, but a reboot is needed Backtrace: Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3 [<c043d0f3>] ? printk+0x14/0x19 [<c0272d4e>] bad_page+0x4e/0x79 [<c0273831>] free_hot_cold_page+0x5b/0x1d3 [<c02739f6>] free_hot_page+0xf/0x11 [<c0273a18>] __free_pages+0x20/0x2b [<c027d170>] __pte_alloc+0x87/0x91 [<c027d25e>] handle_mm_fault+0xe4/0x733 [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63 [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63 [<c0218875>] do_page_fault+0x36f/0x88a This is the case where a concurrent fault already installed the PTE and we get to free the newly allocated one. This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl) which is overlaid with the {private, mapping} struct. union { struct { unsigned long private; struct address_space *mapping; }; spinlock_t ptl; struct kmem_cache *slab; struct page *first_page; }; Normally the spinlock is small enough to not stomp on page->mapping, but PREEMPT_RT=y has huge 'spin'locks. But lockdep kernels should also be able to trigger this splat, as the lock tracking code grows the spinlock to cover page->mapping. The obvious fix is calling pgtable_page_dtor() like the regular pte free path __pte_free_tlb() does. It seems all architectures except x86 and nm10300 already do this, and nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it doesn't do SMP or simply doesnt do MMU at all or something. Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
| * | | xen: actually release memory when shrinking domainDan Magenheimer2009-01-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this: > It appears that in the upstream balloon driver, > the call to HYPERVISOR_update_va_mapping is missing > from decrease_reservation. I think as a result, > the balloon driver is eating memory but not > releasing it to Xen, thus rendering the balloon > driver essentially useless. (Can be observed via xentop.) Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: unmask CPUID levels on Intel CPUsH. Peter Anvin2009-01-221-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware Avuton Olrich reported early boot crashes with v2.6.28 and bisected it down to dc1e35c6e95e8923cf1d3510438b63c600fee1e2 ("x86, xsave: enable xsave/xrstor on cpus with xsave support"). If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID information available. This is required for some features to work, in particular XSAVE. Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com> Tested-by: Avuton Olrich <avuton@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>H. Peter Anvin2009-01-211-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: None (new bit definitions currently unused) Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to <asm/msr-index.h>. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | x86: fix PTE corruption issue while mapping RAM using /dev/memSuresh Siddha2009-01-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Beschorner Daniel reported: > hwinfo problem since 2.6.28, showing this in the oops: > Corrupted page table at address 7fd04de3ec00 Also, PaX Team reported a regression with this commit: > commit 9542ada803198e6eba29d3289abb39ea82047b92 > Author: Suresh Siddha <suresh.b.siddha@intel.com> > Date: Wed Sep 24 08:53:33 2008 -0700 > > x86: track memtype for RAM in page struct This commit breaks mapping any RAM page through /dev/mem, as the reserve_memtype() was not initializing the return attribute type and as such corrupting the PTE entry that was setup with the return attribute type. Because of this bug, application mapping this RAM page through /dev/mem will die with "Corrupted page table at address xxxx" message in the kernel log and also the kernel identity mapping which maps the underlying RAM page gets converted to UC. Fix this by initializing the return attribute type before calling reserve_ram_pages_type() Reported-by: PaX Team <pageexec@freemail.hu> Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com> Tested-and-Acked-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: mtrr fix debug boot parameterThomas Renninger2009-01-211-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while looking at: http://bugzilla.kernel.org/show_bug.cgi?id=11541 I realized that the mtrr.show param cannot work, because the code is processed much too early. This patch: - Declares mtrr.show as early_param - Stays consistent with the previous param (which I doubt that it ever worked), so mtrr.show=1 would still work - Declares mtrr_show as initdata Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix page attribute corruption with cpa()Suresh Siddha2009-01-211-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix sporadic slowdowns and warning messages This patch fixes a performance issue reported by Linus on his Nehalem system. While Linus reverted the PAT patch (commit 58dab916dfb57328d50deb0aa9b3fc92efa248ff) which exposed the issue, existing cpa() code can potentially still cause wrong(page attribute corruption) behavior. This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that various people reported. In 64bit kernel, kernel identity mapping might have holes depending on the available memory and how e820 reports the address range covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole in the address range that is not listed by e820 entries, kernel identity mapping will have a corresponding hole in its 1-1 identity mapping. If cpa() happens on the kernel identity mapping which falls into these holes, existing code fails like this: __change_page_attr_set_clr() __change_page_attr() returns 0 because of if (!kpte). But doesn't set cpa->numpages and cpa->pfn. cpa_process_alias() uses uninitialized cpa->pfn (random value) which can potentially lead to changing the page attribute of kernel text/data, kernel identity mapping of RAM pages etc. oops! This bug was easily exposed by another PAT patch which was doing cpa() more often on kernel identity mapping holes (physical range between max_low_pfn_mapped and 4GB), where in here it was setting the cache disable attribute(PCD) for kernel identity mappings aswell. Fix cpa() to handle the kernel identity mapping holes. Retain the WARN() for cpa() calls to other not present address ranges (kernel-text/data, ioremap() addresses) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"Ingo Molnar2009-01-212-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 4217458dafaa57d8e26a46f5d05ab8c53cf64191. Justin Madru bisected this commit, it was causing weird Firefox crashes. The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of the calling frame, which corrupts the syscall return pt_regs state and thus corrupts user-space register state. So we go back to the slightly less clean but more optimization-safe method of getting to pt_regs. Also add a comment to explain this. Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505 Reported-and-bisected-by: Justin Madru <jdm64@gawab.com> Tested-by: Justin Madru <jdm64@gawab.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: use early clobbers in usercopy*.cAndi Kleen2009-01-212-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions Hugh Dickins noticed that strncpy_from_user() was miscompiled in some circumstances with gcc 4.3. Thanks to Hugh's excellent analysis it was easy to track down. Hugh writes: > Try building an x86_64 defconfig 2.6.29-rc1 kernel tree, > except not quite defconfig, switch CONFIG_PREEMPT_NONE=y > and CONFIG_PREEMPT_VOLUNTARY off (because it expands a > might_fault() there, which hides the issue): using a > gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10). > > It generates the following: > > 0000000000000000 <__strncpy_from_user>: > 0: 48 89 d1 mov %rdx,%rcx > 3: 48 85 c9 test %rcx,%rcx > 6: 74 0e je 16 <__strncpy_from_user+0x16> > 8: ac lods %ds:(%rsi),%al > 9: aa stos %al,%es:(%rdi) > a: 84 c0 test %al,%al > c: 74 05 je 13 <__strncpy_from_user+0x13> > e: 48 ff c9 dec %rcx > 11: 75 f5 jne 8 <__strncpy_from_user+0x8> > 13: 48 29 c9 sub %rcx,%rcx > 16: 48 89 c8 mov %rcx,%rax > 19: c3 retq > > Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1 > (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax". > Isn't it returning 0 when it ought to be returning strlen? The asm constraints for the strncpy_from_user() result were missing an early clobber, which tells gcc that the last output arguments are written before all input arguments are read. Also add more early clobbers in the rest of the file and fix 32-bit usercopy.c in the same way. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> [ since this API is rarely used and no in-kernel user relies on a 'len' return value (they only rely on negative return values) this miscompile was never noticed in the field. But it's worth fixing it nevertheless. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: remove kernel_physical_mapping_init() from init sectionGary Hade2009-01-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix crash with memory hotplug enabled kernel_physical_mapping_init() is called during memory hotplug so it does not belong in the init section. If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on the make command line, arch/x86/mm/init_64.c is compiled with the -fno-inline-functions-called-once gcc option defeating inlining of kernel_physical_mapping_init() within init_memory_mapping(). When kernel_physical_mapping_init() is not inlined it is placed in the .init.text section according to the __init in it's current declaration. A later call to kernel_physical_mapping_init() during a memory hotplug operation encounters an int3 trap because the .init.text section memory has been freed. This patch eliminates the crash caused by the int3 trap by moving the non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | fix: crash: IP: __bitmap_intersects+0x48/0x73Ingo Molnar2009-01-201-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -tip testing found this crash: > [ 35.258515] calling acpi_cpufreq_init+0x0/0x127 @ 1 > [ 35.264127] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73 > [ 35.267554] PGD 0 > [ 35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no allocation of the variable mask, so we pass in an uninitialized cmd.mask field to drv_read(), which then passes it to the scheduler which then crashes ... Switch it over to the much simpler constant-cpumask-pointers approach. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_writeMike Travis2009-01-191-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: use new work_on_cpu function to reduce stack usage Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with a work_on_cpu function for drv_read() and drv_write(). Basically converts do_drv_{read,write} into "work_on_cpu" functions that are now called by drv_read and drv_write. Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now that the work_on_cpu() function is more stable. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Tested-by: Dieter Ries <clip2@gmx.de> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: <cpufreq@vger.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | work_on_cpu: Use our own workqueue.Rusty Russell2009-01-191-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: remove potential clashes with generic kevent workqueue Annoyingly, some places we want to use work_on_cpu are already in workqueues. As per Ingo's suggestion, we create a different workqueue for work_on_cpu. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | work_on_cpu: don't try to get_online_cpus() in work_on_cpu.Rusty Russell2009-01-191-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: remove potential circular lock dependency with cpu hotplug lock This has caused more problems than it solved, with a pile of cpu hotplug locking issues. Followup patches will get_online_cpus() in callers that need it, but if they don't do it they're no worse than before when they were using set_cpus_allowed without locking. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix section mismatch warnings in kernel/setup_percpu.cLeonardo Potenza2009-01-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function setup_cpu_local_masks() has been marked __init, in order to remove the following section mismatch messages: WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var() The function setup_cpu_local_masks() references the function __init alloc_bootmem_cpumask_var(). This is often because setup_cpu_local_masks lacks a __init annotation or the annotation of alloc_bootmem_cpumask_var is wrong. WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var() The function setup_cpu_local_masks() references the function __init alloc_bootmem_cpumask_var(). This is often because setup_cpu_local_masks lacks a __init annotation or the annotation of alloc_bootmem_cpumask_var is wrong. WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var() The function setup_cpu_local_masks() references the function __init alloc_bootmem_cpumask_var(). This is often because setup_cpu_local_masks lacks a __init annotation or the annotation of alloc_bootmem_cpumask_var is wrong. WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var() The function setup_cpu_local_masks() references the function __init alloc_bootmem_cpumask_var(). This is often because setup_cpu_local_masks lacks a __init annotation or the annotation of alloc_bootmem_cpumask_var is wrong. Signed-off-by: Leonardo Potenza <lpotenza@inwind.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: put trigger in to detect mismatched apic versionsMike Travis2009-01-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: add debug warning Fire off one message if two apic's discovered with different apic versions. (this code is only called during CPU init) The goal of this is to pave the way of the removal of the apic_version[] array. We dont expect any apic version incompatibilities in the x86 landscape of systems [if so we dont handle them very well and probably never will handle deep apic version assymetries well], but it's prudent to have a debug check for one kernel cycle nevertheless. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: define ARCH_WANT_FRAME_POINTERSJeff Mahoney2009-01-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit da4276b8299a6544dc41ac2485d3ffca5811b3fb changed a dependency for FRAME_POINTER from X86 to ARCH_WANT_FRAME_POINTERS, but didn't actually define it. This patch adds the definition for ARCH_WANT_FRAME_POINTERS. Without it, FRAME_POINTER can't be enabled on x86. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix assumed to be contiguous leaf page tables for kmap_atomic region ↵Jan Beulich2009-01-163-29/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (take 2) Debugging and original patch from Nick Piggin <npiggin@suse.de> The early fixmap pmd entry inserted at the very top of the KVA is causing the subsequent fixmap mapping code to not provide physically linear pte pages over the kmap atomic portion of the fixmap (which relies on said property to calculate pte addresses). This has caused weird boot failures in kmap_atomic much later in the boot process (initial userspace faults) on a 32-bit PAE system with a larger number of CPUs (smaller CPU counts tend not to run over into the next page so don't show up the problem). Solve this by attempting to clear out the page table, and copy any of its entries to the new one. Also, add a bug if a nonlinear condition is encountered and can't be resolved, which might save some hours of debugging if this fragile scheme ever breaks again... Once we have such logic, we can also use it to eliminate the early ioremap trickery around the page table setup for the fixmap area. This also fixes potential issues with FIX_* entries sharing the leaf page table with the early ioremap ones getting discarded by early_ioremap_clear() and not restored by early_ioremap_reset(). It at once eliminates the temporary (and configuration, namely NR_CPUS, dependent) unavailability of early fixed mappings during the time the fixmap area page tables get constructed. Finally, also replace the hard coded calculation of the initial table space needed for the fixmap area with a proper one, allowing kernels configured for large CPU counts to actually boot. Based-on: Nick Piggin <npiggin@suse.de> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86, UV: cpu_relax in uv_wait_completionCliff Wickman2009-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function uv_wait_completion() spins on reads of a memory-mapped register, waiting for completion of BAU hardware replies. It should call "cpu_relax()" between those reads to improve performance on hyperthreaded configurations. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: Jack Steiner <steiner@sgi.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: avoid early crash in disable_local_APIC()Jan Beulich2009-01-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | E.g. when called due to an early panic. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86, pat: fix reserve_memtype() for legacy 1MB rangeSuresh Siddha2009-01-141-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thierry Vignaud reported: > http://bugzilla.kernel.org/show_bug.cgi?id=12372 > > On P4 with an SiS motherboard (video card is a SiS 651) > X server fails to start with error: > xf86MapVidMem: Could not mmap framebuffer (0x00000000,0x2000) (Invalid > argument) Here X is trying to map first 8KB of memory using /dev/mem. Existing code treats first 0-4KB of memory as non-RAM and 4KB-8KB as RAM. Recent code changes don't allow to map memory with different attributes at the same time. Fix this by treating the first 1MB legacy region as special and always track the attribute requests with in this region using linear linked list (and don't bother if the range is RAM or non-RAM or mixed) Reported-and-tested-by: Thierry Vignaud <tvignaud@mandriva.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86, generic: mark complex bitops.h inlines as __always_inlineAndi Kleen2009-01-135-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: reduce kernel image size Hugh Dickins noticed that older gcc versions when the kernel is built for code size didn't inline some of the bitops. Mark all complex x86 bitops that have more than a single asm statement or two as always inline to avoid this problem. Probably should be done for other architectures too. Ingo then found a better fix that only requires a single line change, but it unfortunately only works on gcc 4.3. On older gccs the original patch still makes a ~0.3% defconfig difference with CONFIG_OPTIMIZE_INLINING=y. With gcc 4.1 and a defconfig like build: 6116998 1138540 883788 8139326 7c323e vmlinux-oi-with-patch 6137043 1138540 883788 8159371 7c808b vmlinux-optimize-inlining ~20k / 0.3% difference. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86, cpufreq: remove leftover copymask_copy()Ingo Molnar2009-01-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix potential boot crash on MAXSMP Remove code left over by: 50c668d: Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read That cmd.cpumask is not allocated anymore. No impact on default !MAXSMP kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds2009-01-263-6/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: drivers/ide/palm_bk3710.c buildfix ide: fix Falcon IDE breakage ide: fix IDE PMAC breakage
| * | | | drivers/ide/palm_bk3710.c buildfixDavid Brownell2009-01-191-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CC drivers/ide/palm_bk3710.o drivers/ide/palm_bk3710.c: In function 'palm_bk3710_probe': drivers/ide/palm_bk3710.c:382: warning: assignment makes integer from pointer without a cast Someone should fix hw_regs_t to neither be a typedef, nor use "unsigned long" where it should use "void __iomem *". Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Kevin Hilman <khilman@deeprootsystems.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
| * | | | ide: fix Falcon IDE breakageMichael Schmitz2009-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [m68k] Falcon IDE: always serialize, in order to force execution of ide_get_lock() and friends. Signed-off-By: Michael Schmitz <schmitz@debian.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> [bart: set flag in falconide_port_info instead of falconide_init()] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
| * | | | ide: fix IDE PMAC breakageAndreas Schwab2009-01-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> writes: > Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> > --- > drivers/ide/ide-probe.c | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > Index: b/drivers/ide/ide-probe.c > =================================================================== > --- a/drivers/ide/ide-probe.c > +++ b/drivers/ide/ide-probe.c > @@ -640,14 +640,9 @@ static int ide_register_port(ide_hwif_t > /* register with global device tree */ > dev_set_name(&hwif->gendev, hwif->name); > hwif->gendev.driver_data = hwif; > - if (hwif->gendev.parent == NULL) { > - if (hwif->dev) > - hwif->gendev.parent = hwif->dev; > - else > - /* Would like to do = &device_legacy */ > - hwif->gendev.parent = NULL; > - } > + hwif->gendev.parent = hwif->dev; This [bart: commit 96d40941236722777c259775640b8880b7dc6f33 ("ide: small ide_register_port() cleanup")] breaks ide-pmac. It overwrites the parent that pmac_ide_macio_attach has set. Signed-off-by: Andreas Schwab <schwab@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* | | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2009-01-2631-488/+441
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: Long btree pointers are still 64 bit on disk [XFS] Remove the rest of the macro-to-function indirections. xfs: sanity check attr fork size xfs: fix bad_features2 fixups for the root filesystem xfs: add a lock class for group/project dquots xfs: lockdep annotations for xfs_dqlock2 xfs: add a separate lock class for the per-mount list of dquots xfs: use mnt_want_write in compat_attrmulti ioctl xfs: fix dentry aliasing issues in open_by_handle
| * | | | | Long btree pointers are still 64 bit on diskDave Chinner2009-01-221-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [XFS] Long btree pointers are still 64 bit on disk On 32 bit machines with CONFIG_LBD=n, XFS reduces the in memory size of xfs_fsblock_t to 32 bits so that it will fit within 32 bit addressing. However, the disk format for long btree pointers are still 64 bits in size. The recent btree rewrite failed to take this into account when initialising new btree blocks, setting sibling pointers to NULL and checking if they are NULL. Hence checking whether a 64 bit NULL was the same as a 32 bit NULL was failingi resulting in NULL sibling pointers failing to be detected correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns in xfs_btree_delrec. Fix this by making all the comparisons and setting of long pointer btree NULL blocks to the disk format, not the in memory format. i.e. use NULLDFSBNO. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Jacek Luczak <difrost.kernel@gmail.com> Reported-by: Danny ter Haar <dth@dth.net> Tested-by: Jacek Luczak <difrost.kernel@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
| * | | | | [XFS] Remove the rest of the macro-to-function indirections.Eric Sandeen2009-01-1923-158/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the last of the macros-defined-to-static-functions. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
| * | | | | xfs: sanity check attr fork sizeChristoph Hellwig2009-01-191-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently we have quite a few kerneloops reports about dereferencing a NULL if_data in the attribute fork. From looking over the code this can only happen if we pass a 0 size argument to xfs_iformat_local. This implies some sort of corruption and in fact the only mailinglist report about this from earlier this year was after a powerfail presumably on a system with write cache and without barriers. Add a quick sanity check for the attr fork size in xfs_iformat to catch these early and without an oops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | xfs: fix bad_features2 fixups for the root filesystemChristoph Hellwig2009-01-193-14/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the bad_features2 fixup and the alignment updates in the superblock are skipped if we mount a filesystem read-only. But for the root filesystem the typical case is to mount read-only first and only later remount writeable so we'll never perform this update at all. It's not a big problem but means the logs of people needing the fixup get spammed at every boot because they never happen on disk. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | xfs: add a lock class for group/project dquotsChristoph Hellwig2009-01-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can have both a user and a group/project dquot locked at the same time, as long as the user dquot is locked first. Tell lockdep about that fact by making the group/project dquots a different lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | xfs: lockdep annotations for xfs_dqlock2Christoph Hellwig2009-01-192-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_dqlock2 locks two xfs_dquots, which is fine as it always locks the dquot with the lower id first. Use mutex_lock_nested to tell lockdep about this fact. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | xfs: add a separate lock class for the per-mount list of dquotsChristoph Hellwig2009-01-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can have both a a quota hash chain and the per-mount list locked at the same time. But given that both use the same struct dqhash as list head we have to tell lockdep that they are different lock classes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
| * | | | | xfs: use mnt_want_write in compat_attrmulti ioctlChristoph Hellwig2009-01-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The compat version of the attrmulti ioctl needs to ask for and then later release write access to the mount just like the native version, otherwise we could potentially write to read-only mounts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
OpenPOWER on IntegriCloud