| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This is bigger than usual - the reason is partly a pent-up stream of
fixes after the merge window and partly accidental. The fixes are:
- five patches to fix a boot failure on Andy Lutomirsky's laptop
- four SGI UV platform fixes
- KASAN fix
- warning fix
- documentation update
- swap entry definition fix
- pkeys fix
- irq stats fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
x86/boot: Rework reserve_real_mode() to allow multiple tries
x86/boot: Defer setup_real_mode() to early_initcall time
x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
x86/boot: Run reserve_bios_regions() after we initialize the memory map
x86/irq: Do not substract irq_tlb_count from irq_call_count
x86/mm: Fix swap entry comment and macro
x86/mm/kaslr: Fix -Wformat-security warning
x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
x86/mm: Disable preemption during CR3 read+write
x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
I made a mistake while converting the driver to the hotplug state
machine and as a result x2apic_cluster_probe() was accessing
cpus_in_cluster before allocating it.
This patch fixes it by setting the cpumask after the allocation the
memory succeeded.
While at it, I marked two functions static which are only used within
this file.
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6b2c28471de5 ("x86/x2apic: Convert to CPU hotplug state machine")
Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot
Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks
like:
efi: mem00: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
efi: mem01: [Boot Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
efi: mem02: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
efi: mem03: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
efi: mem04: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
efi: mem05: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
efi: mem06: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
efi: mem07: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
efi: mem08: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and
up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit
in the loader data range at 0x28000.
Without this patch, it panics due to a failure to allocate the
trampoline. With this patch, it works:
[ +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If reserve_real_mode() fails, panicing immediately means we're
doomed. Make it safe to try more than once to allocate the
trampoline:
- Degrade a failure from panic() to pr_info(). (If we make it to
setup_real_mode() without reserving the trampoline, we'll panic
them.)
- Factor out helpers so that platform code can supply a specific
address to try.
- Warn if reserve_real_mode() is called after we're done with the
memblock allocator. If that were to happen, we would behave
unpredictably.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There's no need to run setup_real_mode() as early as we run it.
Defer it to the same early_initcall that sets up the page
permissions for the real mode code.
This should be a code size reduction. More importantly, it give us
a longer window in which we can allocate the real mode trampoline.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The initialization process for trampoline_cr4_features and
mmu_cr4_features was confusing. The intent is for mmu_cr4_features
and *trampoline_cr4_features to stay in sync, but
trampoline_cr4_features is NULL until setup_real_mode() runs. The
old code synchronized *trampoline_cr4_features *twice*, once in
setup_real_mode() and once in setup_arch(). It also initialized
mmu_cr4_features in setup_real_mode(), which causes the actual value
of mmu_cr4_features to potentially depend on when setup_real_mode()
is called.
With this patch, mmu_cr4_features is initialized directly in
setup_arch(), and *trampoline_cr4_features is synchronized to
mmu_cr4_features when the trampoline is set up.
After this patch, it should be safe to defer setup_real_mode().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d48a263f9912389b957dd495a7127b009259ffe0.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
reserve_bios_regions() is a quirk that reserves memory that we might
otherwise think is available. There's no need to run it so early,
and running it before we have the memory map initialized with its
non-quirky inputs makes it hard to make reserve_bios_regions() more
intelligent.
Move it right after we populate the memblock state.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Cc: Matt Fleming <mfleming@suse.de>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since commit:
52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR")
the TLB remote shootdown is done through call function vector. That
commit didn't take care of irq_tlb_count, which a later commit:
fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts")
... tried to fix.
The fix assumes every increase of irq_tlb_count has a corresponding
increase of irq_call_count. So the irq_call_count is always bigger than
irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
Unfortunately this is not true for the smp_call_function_single() case.
The IPI is only sent if the target CPU's call_single_queue is empty when
adding a csd into it in generic_exec_single. That means if two threads
are both adding flush tlb csds to the same CPU's call_single_queue, only
one IPI is sent. In other words, the irq_call_count is incremented by 1
but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
bigger than irq_call_count and the substract will produce a very large
irq_call_count value due to overflow.
Considering that:
1) it's not worth to send more IPIs for the sake of accurate counting of
irq_call_count in generic_exec_single();
2) it's not easy to tell if the call function interrupt is for TLB
shootdown in __smp_call_function_single_interrupt().
Not to exclude TLB shootdown from call function count seems to be the
simplest fix and this patch just does that.
This bug was found by LKP's cyclic performance regression tracking recently
with the vm-scalability test suite. I have bisected to commit:
3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem")
This commit didn't do anything wrong but revealed the irq_call_count
problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file
concurrent with multiple threads. When remap_one is try_to_unmap_one(),
then multiple threads could queue flush TLB to the same CPU but only
one IPI will be sent.
Since the commit was added in Linux v3.19, the counting problem only
shows up from v3.19 onwards.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A recent patch changed the format of a swap PTE.
The comment explaining the format of the swap PTE is wrong about
the bits used for the swap type field. Amusingly, the ASCII art
and the patch description are correct, but the comment itself
is wrong.
As I was looking at this, I also noticed that the
SWP_OFFSET_FIRST_BIT has an off-by-one error. This does not
really hurt anything. It just wasted a bit of space in the PTE,
giving us 2^59 bytes of addressable space in our swapfiles
instead of 2^60. But, it doesn't match with the comments, and it
wastes a bit of space, so fix it.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Fixes: 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
debug_putstr() is used to output strings without using printf-like
formatting but debug_putstr(v) is defined as early_printk(v) in
arch/x86/lib/kaslr.c.
This makes clang reports the following warning when building
with -Wformat-security:
arch/x86/lib/kaslr.c:57:15: warning: format string is not a string
literal (potentially insecure) [-Wformat-security]
debug_putstr(purpose);
^~~~~~~
Fix this by using "%s" in early_printk().
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160806102039.27221-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
manipulation
The Memory Protection Keys "rights register" (PKRU) is
XSAVE-managed, and is saved/restored along with the FPU state.
When kernel code accesses FPU regsisters, it does a delicate
dance with preempt. Otherwise, the context switching code can
get confused as to whether the most up-to-date state is in the
registers themselves or in the XSAVE buffer.
But, PKRU is not a normal FPU register. Using it does not
generate the normal device-not-available (#NM) exceptions which
means we can not manage it lazily, and the kernel completley
disallows using lazy mode when it is enabled.
The dance with preempt *only* occurs when managing the FPU
lazily. Since we never manage PKRU lazily, we do not have to do
the dance with preempt; we can access it directly. Doing it
this way saves a ton of complicated code (and is faster too).
Further, the XSAVES reenabling failed to patch a bit of code
in fpu__xfeature_set_state() the checked for compacted buffers.
That check caused fpu__xfeature_set_state() to silently refuse to
work when the kernel is using compacted XSAVE buffers. This
broke execute-only and future pkey_mprotect() support when using
compact XSAVE buffers.
But, removing fpu__xfeature_set_state() gets rid of this issue,
in addition to the nice cleanup and speedup.
This fixes the same thing as a fix that Sai posted:
https://lkml.org/lkml/2016/7/25/637
The fix that he posted is a much more obviously correct, but I
think we should just do this instead.
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-Cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160727232040.7D060DAD@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Building an X86_64 kernel with W=1 throws a total of 9,948 lines of warnings of
this form for both 32-bit and 64-bit syscall tables. Given that the entire rest
of the build for my config only generates 8,375 lines of output, this is a big
reduction in the warnings generated.
The warnings follow this pattern:
./arch/x86/include/generated/asm/syscalls_32.h:885:21: warning: initialized field overwritten [-Woverride-init]
__SYSCALL_I386(379, compat_sys_pwritev2, )
^
arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
^~~
./arch/x86/include/generated/asm/syscalls_32.h:885:21: note: (near initialization for 'ia32_sys_call_table[379]')
__SYSCALL_I386(379, compat_sys_pwritev2, )
^
arch/x86/entry/syscall_32.c:13:46: note: in definition of macro '__SYSCALL_I386'
#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
Since we intentionally build the syscall tables this way, ignore that one
warning in the two files.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/7464.1470021890@turing-police.cc.vt.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The latest UV kernel support panics when RHEL7 kexec's the kdump kernel
to make a dumpfile. This patch fixes the problem by turning off all UV
support if NUMA is off.
Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.577755634@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There are some circumstances where the UV4 BIOS cannot provide the
correct Proximity Node values to associate with specific Sockets and
Physical Nodes. The decision was made to remove these values from BIOS
and for the kernel to get these values from the standard ACPI tables.
Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.414210079@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
a crash
Save the uv_systab::size field before doing the iounmap()
of the struct pointer, to avoid a NULL dereference crash.
Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.250424783@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The UV4 Socket IDs are not guaranteed to equate to Node values which
can cause the GAM (Global Addressable Memory) table lookups to fail.
Fix this by using an independent index into the GAM table instead of
the Socket ID to reference the base address.
Tested-by: Frank Ramsay <framsay@sgi.com>
Tested-by: John Estabrook <estabrook@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160801184050.048755337@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Clarify why exactly RF cannot be restored properly by SYSRET to avoid
confusion.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160803171429.GA2590@nazgul.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There's a subtle preemption race on UP kernels:
Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.
But then, there is this scenario on x86-UP:
TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
-> mmput()
-> exit_mmap()
-> tlb_finish_mmu()
-> tlb_flush_mmu()
-> tlb_flush_mmu_tlbonly()
-> tlb_flush()
-> flush_tlb_mm_range()
-> __flush_tlb_up()
-> __flush_tlb()
-> __native_flush_tlb()
At this point current->mm is NULL but current->active_mm still points to
the "old" mm.
Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.
Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.
Now the fun part:
Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.
The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…
This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.
Fix this by disabling preemption across the critical code section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Default implementation expects 6 pages maximum are needed for low page
allocations. If KASLR memory randomization is enabled, the worse case
of e820 layout would require 12 pages (no large pages). It is due to the
PUD level randomization and the variable e820 memory layout.
This bug was found while doing extensive testing of KASLR memory
randomization on different type of hardware.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 021182e52fe0 ("Enable KASLR for physical mapping memory regions")
Link: http://lkml.kernel.org/r/1470762665-88032-2-git-send-email-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Initialize KASLR memory randomization after max_pfn is initialized. Also
ensure the size is rounded up. It could create problems on machines
with more than 1Tb of memory on certain random addresses.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Aleksey Makarov <aleksey.makarov@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 021182e52fe0 ("Enable KASLR for physical mapping memory regions")
Link: http://lkml.kernel.org/r/1470762665-88032-1-git-send-email-thgarnie@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Dmitry Vyukov has reported unexpected KASAN stackdepot growth:
https://github.com/google/kasan/issues/36
... which is caused by the APIC handlers not being present in .irqentry.text:
When building with CONFIG_FUNCTION_GRAPH_TRACER=y or CONFIG_KASAN=y, put the
APIC interrupt handlers into the .irqentry.text section. This is needed
because both KASAN and function graph tracer use __irqentry_text_start and
__irqentry_text_end to determine whether a function is an IRQ entry point.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aryabinin@virtuozzo.com
Cc: kasan-dev@googlegroups.com
Cc: kcc@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1468575763-144889-1-git-send-email-glider@google.com
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Misc fixes: a /dev/rtc regression fix, two APIC timer period
calibration fixes, an ARM clocksource driver fix and a NOHZ
power use regression fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
x86/timers/apic: Inform TSC deadline clockevent device about recalibration
x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
timers: Fix get_next_timer_interrupt() computation
clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Ville Syrjälä reports "The first time I run hwclock after rebooting
I get this:
open("/dev/rtc", O_RDONLY) = 3
ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = 0
select(4, [3], NULL, NULL, {10, 0}) = 0 (Timeout)
ioctl(3, PHN_NOT_OH or RTC_UIE_OFF, 0) = 0
close(3) = 0
On all subsequent runs I get this:
open("/dev/rtc", O_RDONLY) = 3
ioctl(3, PHN_SET_REGS or RTC_UIE_ON, 0) = -1 EINVAL (Invalid argument)
ioctl(3, RTC_RD_TIME, 0x7ffd76b3ae70) = -1 EINVAL (Invalid argument)
close(3) = 0"
This was caused by a stupid typo in a patch that should have been
a simple rename to move around contents of a header file, but
accidentally wrote zeroes into the rtc rather than reading from
it:
463a86304cae ("char/genrtc: x86: remove remnants of asm/rtc.h")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: rtc-linux@googlegroups.com
Fixes: 463a86304cae ("char/genrtc: x86: remove remnants of asm/rtc.h")
Link: http://lkml.kernel.org/r/20160809195528.1604312-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |\ \ \ \ \
| | |/ / / /
| | | | | |
| | | | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This patch eliminates a source of imprecise APIC timer interrupts,
which imprecision may result in double interrupts or even late
interrupts.
The TSC deadline clockevent devices' configuration and registration
happens before the TSC frequency calibration is refined in
tsc_refine_calibration_work().
This results in the TSC clocksource and the TSC deadline clockevent
devices being configured with slightly different frequencies: the former
gets the refined one and the latter are configured with the inaccurate
frequency detected earlier by means of the "Fast TSC calibration using PIT".
Within the APIC code, introduce the notifier function
lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
clockevent devices with the current tsc_khz.
Call it from the TSC code after TSC calibration refinement has happened.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
[ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
clockevents frequency roundoff error
I noticed the following bug/misbehavior on certain Intel systems: with a
single task running on a NOHZ CPU on an Intel Haswell, I recognized
that I did not only get the one expected local_timer APIC interrupt, but
two per second at minimum. (!)
Further tracing showed that the first one precedes the programmed deadline
by up to ~50us and hence, it did nothing except for reprogramming the TSC
deadline clockevent device to trigger shortly thereafter again.
The reason for this is imprecise calibration, the timeout we program into
the APIC results in 'too short' timer interrupts. The core (hr)timer code
notices this (because it has a precise ktime source and sees the short
interrupt) and fixes it up by programming an additional very short
interrupt period.
This is obviously suboptimal.
The reason for the imprecise calibration is twofold, and this patch
fixes the first reason:
In setup_APIC_timer(), the registered clockevent device's frequency
is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
it with 1000 afterwards:
(tsc_khz / TSC_DIVISOR) * 1000
The multiplication with 1000 is done for converting from kHz to Hz and the
division by TSC_DIVISOR is carried out in order to make sure that the final
result fits into an u32.
However, with the order given in this calculation, the roundoff error
introduced by the division gets magnified by a factor of 1000 by the
following multiplication.
To fix it, reversing the order of the division and the multiplication a la:
(tsc_khz * 1000) / TSC_DIVISOR
... reduces the roundoff error already.
Furthermore, if TSC_DIVISOR divides 1000, associativity holds:
(tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)
and thus, the roundoff error even vanishes and the whole operation can be
carried out within 32 bits.
The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
TSC_DIVISOR still allows for TSC frequencies up to
2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
in the next years.
Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
the order of the divison and the multiplication in the calculation of
the registered clockevent device's frequency.
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christopher S. Hall <christopher.s.hall@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
[ Improved changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
perf-cgroups fix and an AUX events fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add enable_box for client MSR uncore
perf/x86/intel/uncore: Fix uncore num_counters
uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
perf/core: Set cgroup in CPU contexts for new cgroup events
perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
perf probe ppc64le: Fix probe location when using DWARF
perf probe: Add function to post process kernel trace events
tools: Sync cpufeatures headers with the kernel
toops: Sync tools/include/uapi/linux/bpf.h with the kernel
tools: Sync cpufeatures.h and vmx.h with the kernel
perf probe: Support signedness casting
perf stat: Avoid skew when reading events
perf probe: Fix module name matching
perf probe: Adjust map->reloc offset when finding kernel symbol from map
perf hists: Trim libtraceevent trace_seq buffers
perf script: Add 'bpf-output' field to usage message
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
There are bug reports about miscounting uncore counters on some
client machines like Sandybridge, Broadwell and Skylake. It is
very likely to be observed on idle systems.
This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
cleared after Package C7, and nothing will be count.
The related errata (HSD 158) could be found in:
www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
in ->enable_box(). The workaround does not cover all cases. It helps for new
events after returning from C7. But it cannot prevent C7, it will still
miscount if a counter is already active.
There is no drawback in leaving it enabled, so it does not need
disable_box() here.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Some uncore boxes' num_counters value for Haswell server and
Broadwell server are not correct (too large, off by one).
This issue was found by comparing the code with the document. Although
there is no bug report from users yet, accessing non-existent counters
is dangerous and the behavior is undefined: it may cause miscounting or
even crashes.
This patch makes them consistent with the uncore document.
Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Since instruction decoder now supports EVEX-encoded instructions, two fixes
are needed to correctly handle them in uprobes.
Extended bits for MODRM.rm field need to be sanitized just like we do it
for VEX3, to avoid encoding wrong register for register-relative access.
EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
paranoid here: proper encoding for register-relative access
should have EVEX.x = 1.
Secondly, we should fetch vex.vvvv for EVEX too.
This is now super easy because instruction decoder populates
vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # v4.1+
Fixes: 8a764a875fe3 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"A fix for EFI capsules and an SGI UV platform fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/capsule: Allocate whole capsule into virtual memory
x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
efi_runtime_disabled case
This problem has actually been in the UV code for a while, but we didn't
catch it until recently, because we had been relying on EFI_OLD_MEMMAP
to allow our systems to boot for a period of time. We noticed the issue
when trying to kexec a recent community kernel, where we hit this NULL
pointer dereference in efi_sync_low_kernel_mappings():
[ 0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
[ 0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0
The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
chunk of setup_efi_state() that sets the efi_loader_signature for the
kexec'd kernel. When the kexec'd kernel boots, it won't set EFI_BOOT in
setup_arch, so we completely avoid the bug.
We always kexec with noefi on the command line, so this shouldn't be an
issue, but since we're not actually checking for efi_runtime_disabled in
uv_bios_init(), we end up trying to do EFI runtime callbacks when we
shouldn't be. This patch just adds a check for efi_runtime_disabled in
uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
true.
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: <stable@vger.kernel.org> # v4.7
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some powerpc fixes for 4.8:
Misc:
- powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
- powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
- powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
- cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
- powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
- cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
- powerpc/vdso: Add missing include file from Guenter Roeck
- powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
- powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
- powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
- crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
- powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
Benjamin Herrenschmidt:
- powerpc/32: Fix crash during static key init
- powerpc: Update obsolete comment in setup_32.c about early_init()
- powerpc: Print the kernel load address at the end of prom_init()
- powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
- powerpc/xics: Properly set Edge/Level type and enable resend
Mahesh Salgaonkar:
- powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
- powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
- powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
- powerpc/powernv: Load correct TOC pointer while waking up from winkle.
Andrew Donnellan:
- cxl: Fix sparse warnings
- cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
Michael Ellerman:
- selftests/powerpc: Specify we expect to build with std=gnu99
- powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
- powerpc/pci: Fix endian bug in fixed PHB numbering"
* tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
selftests/powerpc: Specify we expect to build with std=gnu99
powerpc/vdso: Fix build rules to rebuild vdsos correctly
powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
powerpc/32: Fix crash during static key init
powerpc: Update obsolete comment in setup_32.c about early_init()
powerpc: Print the kernel load address at the end of prom_init()
powerpc/ptrace: Fix coredump since ptrace TM changes
powerpc/32: Fix csum_partial_copy_generic()
cxl: Set psl_fir_cntl to production environment value
powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
powerpc/pci: Fix endian bug in fixed PHB numbering
powerpc/eeh: Switch to conventional PCI address output in EEH log
cxl: Fix sparse warnings
cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
cxl: Use fixed width predefined types in data structure.
powerpc/vdso: Add missing include file
powerpc: Fix unused function warning 'lmb_to_memblock'
powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
...
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When using if_changed, we need to add FORCE as a dependency (see
Documentation/kbuild/makefiles.txt) otherwise we don't get command line
change checking amongst other things. This has resulted in vdsos not
being rebuilt when switching between big and little endian.
The vdso64/32ld commands have to be changed around to avoid pulling
FORCE into the linker command line (code copied from x86).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When we introduced the little endian support, we added the endian flags
to CC directly using override. I don't know the history of why we did
that, I suspect no one does.
Although this mostly works, it has one bug, which is that CROSS32CC
doesn't get -mbig-endian. That means when the compiler is little endian
by default and the user is building big endian, vdso32 is incorrectly
compiled as little endian and the kernel fails to build.
Instead we can add the endian flags to cflags-y/aflags-y, and then
append those to KBUILD_CFLAGS/KBUILD_AFLAGS.
This has the advantage of being 1) less ugly, 2) the documented way of
adding flags in the arch Makefile and 3) it fixes building vdso32 with a
LE toolchain.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
We cannot do those initializations from apply_feature_fixups() as
this function runs in a very restricted environment on 32-bit where
the kernel isn't running at its linked address and the PTRRELOC()
macro must be used for any global accesss.
Instead, split them into a separtate steup_feature_keys() function
which is called in a more suitable spot on ppc32.
Fixes: 309b315b6ec6 ("powerpc: Call jump_label_init() in apply_feature_fixups()")
Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
We don't identify the machine type anymore...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This makes it easier to debug crashes that happen very early before
the kernel takes over Open Firmware by allowing us to relate the OF
reported crashing addresses to offsets within the kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Commit 8d460f6156cd ("powerpc/process: Add the function
flush_tmregs_to_thread") added flush_tmregs_to_thread() and included
the assumption that it would only be called for a task which is not
current.
Although this is correct for ptrace, when generating a core dump, some
of the routines which call flush_tmregs_to_thread() are called. This
leads to a WARNing such as:
Not expecting ptrace on self: TM regs may be incorrect
------------[ cut here ]------------
WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80
CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1
task: c000000fe631b600 task.stack: c000000fe63b0000
NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780
REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d)
MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000
...
NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80
LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80
Call Trace:
flush_tmregs_to_thread+0x74/0x80 (unreliable)
vsr_get+0x64/0x1a0
elf_core_dump+0x604/0x1430
do_coredump+0x5fc/0x1200
get_signal+0x398/0x740
do_signal+0x54/0x2b0
do_notify_resume+0x98/0xb0
ret_from_except_lite+0x70/0x74
So fix flush_tmregs_to_thread() to detect the case where it is called on
current, and a transaction is active, and in that case flush the TM regs
to the thread_struct.
This patch also moves flush_tmregs_to_thread() into ptrace.c as it is
only called from that file.
Fixes: 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination
address is odd and initial csum is not null
In that (rare) case the initial csum value has to be rotated one byte
as well as the resulting value is
This patch also fixes related comments
Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The generic allocation code may sometimes decide to assign a prefetchable
64-bit BAR to the M32 window. In fact it may also decide to allocate
a 64-bit non-prefetchable BAR to the M64 one ! So using the resource
flags as a test to decide which window was used for PE allocation is
just wrong and leads to insane PE numbers.
Instead, compare the addresses to figure it out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename the function as agreed by Ben & Gavin]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
When machine check occurs with MSR(RI=0), it means MC interrupt is
unrecoverable and kernel goes down to panic path. But the console
message still shows it as recovered. This patch fixes the MCE console
messages.
Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The recent commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number
based on device-tree properties"), added code to read a 64-bit property
from the device tree, and if not found read a 32-bit property (reg).
There was a bug in the 32-bit case, on big endian machines, due to the
use of the 64-bit value to read the 32-bit property. The cast of &prop
means we end up writing to the high 32-bit of prop, leaving the low
32-bits containing whatever junk was on the stack.
If that junk value was non-zero, and < MAX_PHBS, we would end up using
it as the PHB id. This results in users seeing what appear to be random
PHB ids.
Fix it by reading into a u32 property and then assigning that to the
u64 value, letting the CPU do the correct conversions for us.
Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This is a very minor/trivial fix for the output of PCI address on EEH
logs. The PCI address on "OF node" field currently is using ":" as a
separator for the function, but the usual separator is ".". This patch
changes the separator to dot, so the PCI address is printed as usual.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Some powerpc builds fail with the following buld error.
In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0,
from arch/powerpc/kernel/vdso.c:28:
arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr':
arch/powerpc/include/asm/cputhreads.h:101:2: error:
implicit declaration of function 'cpu_has_feature'
Fixes: b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This patch fixes the following warning:
arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function]
static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
^~~~~~~~~~~~~~~
The only consumer of this function is 'dlpar_remove_lmb', which is
enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same
ifdef block.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The current implementation of MCE early handling modifies CR0/1 registers
without saving its old values. Fix this by moving early check for
powersaving mode to machine_check_handle_early().
The power architecture 2.06 or later allows the possibility of getting
machine check while in nap/sleep/winkle. The last bit of HSPRG0 is set
to 1, if thread is woken up from winkle. Hence, clear the last bit of
HSPRG0 (r13) before MCE handler starts using it as paca pointer.
Also, the current code always puts the thread into nap state irrespective
of whatever idle state it woke up from. Fix that by looking at
paca->thread_idle_state and put the thread back into same state where it
came from.
Fixes: 1c51089f777b ("powerpc/book3s: Return from interrupt if coming from evil context.")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes
in subsequent patch can use it.
No functionality change.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The function pnv_restore_hyp_resource() loads the TOC into r2 from
the invalid PACA pointer before fixing r13 value. This do not affect
POWER ISA 3.0 but it does have an impact on POWER ISA 2.07 or less
leading CPU to get stuck forever.
login: [ 471.830433] Processor 120 is stuck.
This can be easily reproducible using following steps:
- Turn off SMT
$ ppc64_cpu --smt=off
- offline/online any online cpu (Thread 0 of any core which is online)
$ echo 0 > /sys/devices/system/cpu/cpu<num>/online
$ echo 1 > /sys/devices/system/cpu/cpu<num>/online
For POWER ISA 2.07 or less, the last bit of HSPRG0 is set indicating
that thread is waking up from winkle. Hence, the last bit of HSPRG0(r13)
needs to be clear before accessing it as PACA to avoid loading invalid
values from invalid PACA pointer.
Fix this by loading TOC after r13 register is corrected.
Fixes: bcef83a00dc4 ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Commit fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE
invalidate register") broke TCE invalidation on IODA2/PHB3 for real
mode.
This makes invalidate work again.
Fixes: fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|