| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.
Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merge misc fixes from Andrew Morton:
"12 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dmapool: fix overflow condition in pool_find_page()
thermal: avoid division by zero in power allocator
memcg: remove pcp_counter_lock
kprobes: use _do_fork() in samples to make them work again
drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
memcg: make mem_cgroup_read_stat() unsigned
memcg: fix dirty page migration
dax: fix NULL pointer in __dax_pmd_fault()
mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
userfaultfd: remove kernel header include from uapi header
arch/x86/include/asm/efi.h: fix build failure
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With KMEMCHECK=y, KASAN=n:
arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
Don't #undef memcpy if KASAN=n.
Fixes: 769a8089c1fd ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull KVM fixes from Paolo Bonzini:
"(Relatively) a lot of reverts, mostly.
Bugs have trickled in for a new feature in 4.2 (MTRR support in
guests) so I'm reverting it all; let's not make this -rc period busier
for KVM than it's been so far. This covers the four reverts from me.
The fifth patch is being reverted because Radim found a bug in the
implementation of stable scheduler clock, *but* also managed to
implement the feature entirely without hypervisor support. So instead
of fixing the hypervisor side we can remove it completely; 4.4 will
get the new implementation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
Update KVM homepage Url
Revert "KVM: SVM: use NPT page attributes"
Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"
Revert "KVM: SVM: Sync g_pat with guest-written PAT value"
Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"
Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).
The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug. This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.
Cc: stable@vger.kernel.org
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 3c2e7f7de3240216042b61073803b61b9b3cfb22.
Initializing the mapping from MTRR to PAT values was reported to
fail nondeterministically, and it also caused extremely slow boot
(due to caching getting disabled---bug 103321) with assigned devices.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Sebastian Schuette <dracon@ewetel.net>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 5492830370171b6a4ede8a3bfba687a8d0f25fa5.
It builds on the commit that is being reverted next.
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit e098223b789b4a618dacd79e5e0dad4a9d5018d1,
which has a dependency on other commits being reverted.
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit fd717f11015f673487ffc826e59b2bad69d20fe5.
It was reported to cause Machine Check Exceptions (bug 104091).
Reported-by: harn-solo@gmx.de
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
system MSR"
Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
MSR is a change of ABI. Probably only 2.6.16 based SLES 10 breaks due
to its custom enhancements to kvmclock, but KVM never declared the MSR
only for one-shot initialization. (Doc says that only one write is
needed.)
This reverts commit b7e60c5aedd2b63f16ef06fde4f81ca032211bc5.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Another pile of fixes for perf:
- Plug overflows and races in the core code
- Sanitize the flow of the perf syscall so we error out before
handling the more complex and hard to undo setups
- Improve and fix Broadwell and Skylake hardware support
- Revert a fix which broke what it tried to fix in perf tools
- A couple of smaller fixes in various places of perf tools"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix copying of /proc/kcore
perf intel-pt: Remove no_force_psb from documentation
perf probe: Use existing routine to look for a kernel module by dso->short_name
perf/x86: Change test_aperfmperf() and test_intel() to static
tools lib traceevent: Fix string handling in heterogeneous arch environments
perf record: Avoid infinite loop at buildid processing with no samples
perf: Fix races in computing the header sizes
perf: Fix u16 overflows
perf: Restructure perf syscall point of no return
perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask
perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific
perf tools: Bool functions shouldn't return -1
tools build: Add test for presence of __get_cpuid() gcc builtin
tools build: Add test for presence of numa_num_possible_cpus() in libnuma
Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum"
perf stat: Fix per-pkg event reporting bug
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes the following sparse warnings:
arch/x86/kernel/cpu/perf_event_msr.c:13:6: warning: symbol
'test_aperfmperf' was not declared. Should it be static?
arch/x86/kernel/cpu/perf_event_msr.c:18:6: warning: symbol
'test_intel' was not declared. Should it be static?
Signed-off-by: Geliang Tang <geliangtang@163.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4588e8ab09638458f2451af572827108be3b4a36.1443123796.git.geliangtang@163.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Stephane pointed out that the extrareg mask was one bit too short.
The bubble width field was truncated by one bit. Fix that here.
Also add some extra comments on the reserved bits inside the event
select code.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441835640-21347-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile
frontend problems (like ITLB or decoding issues).
The new event is configured through a separate MSR, which selects
a range of sub events.
Define the extra MSR as a extra reg and export support for it
through sysfs. To avoid duplicating the existing
tables use a new function to add new entries to existing tables.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The counter constraint for CYCLE_ACTIVITY.* on Broadwell covered
all CYCLE_ACTIVITY.* sub events, and forced them on counter 2.
But actually only one sub event (umask 8) needs to be on counter 2,
all others do not have any constraint.
Only force that subevent. This fixes groups with multiple
CYCLE_ACTIVITY.* events, for example:
% perf stat -x, -e '{cpu/event=0xa3,umask=0x6,cmask=6/,\
cpu/event=0xa2,umask=0x8/,\
cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/}' true
122150,,cpu/event=0xa3,umask=0x6,cmask=6/,846486,100.00
16483,,cpu/event=0xa2,umask=0x8/,846486,100.00
252280,,cpu/event=0xa3,umask=0x4,cmask=4/,846486,100.00
233604,,cpu/event=0xb1,umask=0x1,cmask=1/,846486,100.00
%
Without this patch the third result would be <unsupported>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1442267222-16464-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two bugfixes from Andy addressing at least some of the subtle NMI
related wreckage which has been reported by Sasha Levin"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi/64: Fix a paravirt stack-clobbering bug in the NMI code
x86/paravirt: Replace the paravirt nop with a bona fide empty function
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The NMI entry code that switches to the normal kernel stack needs to
be very careful not to clobber any extra stack slots on the NMI
stack. The code is fine under the assumption that SWAPGS is just a
normal instruction, but that assumption isn't really true. Use
SWAPGS_UNSAFE_STACK instead.
This is part of a fix for some random crashes that Sasha saw.
Fixes: 9b6e6a8334d5 ("x86/nmi/64: Switch stacks on userspace NMI entry")
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/974bc40edffdb5c2950a5c4977f821a446b76178.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):
ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6>
2792: R_X86_64_PC32 pv_irq_ops+0x2c
That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse. This is bad news for a CLBR_NONE operation.
Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in? This can potentially cause breakage
that is very difficult to debug.
A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.
The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.
Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.
The Xen case may have other problems, so document them.
This is part of a fix for some random crashes that Sasha saw.
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for things we merged for v4.3 (VPD, MSI, and bridge
window management), and a new Renesas R8A7794 SoC device ID.
Details:
Resource management:
- Revert pci_read_bridge_bases() unification (Bjorn Helgaas)
- Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn
Helgaas)
MSI:
- Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson)
Renesas R-Car host bridge driver:
- Add R8A7794 support (Sergei Shtylyov)
Miscellaneous:
- Fix devfn for VPD access through function 0 (Alex Williamson)
- Use function 0 VPD only for identical functions (Alex Williamson)"
* tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: rcar: Add R8A7794 support
PCI: Use function 0 VPD for identical functions, regular VPD for others
PCI: Fix devfn for VPD access through function 0
PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses
PCI: Clear IORESOURCE_UNSET when clipping a bridge window
PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Revert dff22d2054b5 ("PCI: Call pci_read_bridge_bases() from core instead
of arch code").
Reading PCI bridge windows is not arch-specific in itself, but there is PCI
core code that doesn't work correctly if we read them too early. For
example, Hannes found this case on an ARM Freescale i.mx6 board:
pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
pci 0000:00:00.0: BAR 8: no space for [mem size 0x01000000] (mem window)
pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00200000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00004000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00000100]
The 00:00.0 mem window needs to be at least 3MB: the 01:00.0 device needs
0x204100 of space, and mem windows are megabyte-aligned.
Bus sizing can increase a bridge window size, but never *decrease* it (see
d65245c3297a ("PCI: don't shrink bridge resources")). Prior to
dff22d2054b5, ARM didn't read bridge windows at all, so the "original size"
was zero, and we assigned a 3MB window.
After dff22d2054b5, we read the bridge windows before sizing the bus. The
firmware programmed a 16MB window (size 0x01000000) in 00:00.0, and since
we never decrease the size, we kept 16MB even though we only needed 3MB.
But 16MB doesn't fit in the host bridge aperture, so we failed to assign
space for the window and the downstream devices.
I think this is a defect in the PCI core: we shouldn't rely on the firmware
to assign sensible windows.
Ray reported a similar problem, also on ARM, with Broadcom iProc.
Issues like this are too hard to fix right now, so revert dff22d2054b5.
Reported-by: Hannes <oe5hpm@gmail.com>
Reported-by: Ray Jui <rjui@broadcom.com>
Link: http://lkml.kernel.org/r/CAAa04yFQEUJm7Jj1qMT57-LG7ZGtnhNDBe=PpSRa70Mj+XhW-A@mail.gmail.com
Link: http://lkml.kernel.org/r/55F75BB8.4070405@broadcom.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull KVM fixes from Paolo Bonzini:
"AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
bug fixes too"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: disable halt_poll_ns as default for s390x
KVM: x86: fix off-by-one in reserved bits check
KVM: x86: use correct page table format to check nested page table reserved bits
KVM: svm: do not call kvm_set_cr0 from init_vmcb
KVM: x86: trap AMD MSRs for the TSeg base and mask
KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
kvm: svm: reset mmu on VCPU reset
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.
Architectures are now free to set their own halt_poll_ns
default value.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
29ecd6601904 ("KVM: x86: avoid uninitialized variable warning",
2015-09-06) introduced a not-so-subtle problem, which probably
escaped review because it was not part of the patch context.
Before the patch, leaf was always equal to iterator.level. After,
it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set,
and when is_shadow_zero_bits_set does another "-1" the check on
reserved bits becomes incorrect. Using "iterator.level" in the call
fixes this call trace:
WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]()
Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq
[...]
Call Trace:
dump_stack+0x4e/0x84
warn_slowpath_common+0x95/0xe0
warn_slowpath_null+0x1a/0x20
handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]
tdp_page_fault+0x231/0x290 [kvm]
? emulator_pio_in_out+0x6e/0xf0 [kvm]
kvm_mmu_page_fault+0x36/0x240 [kvm]
? svm_set_cr0+0x95/0xc0 [kvm_amd]
pf_interception+0xde/0x1d0 [kvm_amd]
handle_exit+0x181/0xa70 [kvm_amd]
? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm]
? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
? preempt_count_sub+0x9b/0xf0
? mutex_lock_killable_nested+0x26f/0x490
? preempt_count_sub+0x9b/0xf0
kvm_vcpu_ioctl+0x358/0x710 [kvm]
? __fget+0x5/0x210
? __fget+0x101/0x210
do_vfs_ioctl+0x2f4/0x560
? __fget_light+0x29/0x90
SyS_ioctl+0x4c/0x90
entry_SYSCALL_64_fastpath+0x16/0x73
---[ end trace 37901c8686d84de6 ]---
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Intel CPUID on AMD host or vice versa is a weird case, but it can
happen. Handle it by checking the host CPU vendor instead of the
guest's in reset_tdp_shadow_zero_bits_mask. For speed, the
check uses the fact that Intel EPT has an X (executable) bit while
AMD NPT has NX.
Reported-by: Borislav Petkov <bp@alien8.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
memslots array (SRCU protected). Using a mini SRCU critical section
is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
the VMX vcpu_create callback calls synchronize_srcu.
Fixes this lockdep splat:
===============================
[ INFO: suspicious RCU usage. ]
4.3.0-rc1+ #1 Not tainted
-------------------------------
include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-i38/17000:
#0: (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]
[...]
Call Trace:
dump_stack+0x4e/0x84
lockdep_rcu_suspicious+0xfd/0x130
kvm_zap_gfn_range+0x188/0x1a0 [kvm]
kvm_set_cr0+0xde/0x1e0 [kvm]
init_vmcb+0x760/0xad0 [kvm_amd]
svm_create_vcpu+0x197/0x250 [kvm_amd]
kvm_arch_vcpu_create+0x47/0x70 [kvm]
kvm_vm_ioctl+0x302/0x7e0 [kvm]
? __lock_is_held+0x51/0x70
? __fget+0x101/0x210
do_vfs_ioctl+0x2f4/0x560
? __fget_light+0x29/0x90
SyS_ioctl+0x4c/0x90
entry_SYSCALL_64_fastpath+0x16/0x73
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.
Reported-by: M A Young <m.a.young@durham.ac.uk>
Cc: stable@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When INIT/SIPI sequence is sent to VCPU which before that
was in use by OS, VMRUN might fail with:
KVM: entry failed, hardware error 0xffffffff
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000006d3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=00000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =9a00 0009a000 0000ffff 00009a00
[...]
CR0=60000010 CR2=b6f3e000 CR3=01942000 CR4=000007e0
[...]
EFER=0000000000000000
with corresponding SVM error:
KVM: FAILED VMRUN WITH VMCB:
[...]
cpl: 0 efer: 0000000000001000
cr0: 0000000080010010 cr2: 00007fd7fe85bf90
cr3: 0000000187d0c000 cr4: 0000000000000020
[...]
What happens is that VCPU state right after offlinig:
CR0: 0x80050033 EFER: 0xd01 CR4: 0x7e0
-> long mode with CR3 pointing to longmode page tables
and when VCPU gets INIT/SIPI following transition happens
CR0: 0 -> 0x60000010 EFER: 0x0 CR4: 0x7e0
-> paging disabled with stale CR3
However SVM under the hood puts VCPU in Paged Real Mode*
which effectively translates CR0 0x60000010 -> 80010010 after
svm_vcpu_reset()
-> init_vmcb()
-> kvm_set_cr0()
-> svm_set_cr0()
but from kvm_set_cr0() perspective CR0: 0 -> 0x60000010
only caching bits are changed and
commit d81135a57aa6
("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")'
regressed svm_vcpu_reset() which relied on MMU being reset.
As result VMRUN after svm_vcpu_reset() tries to run
VCPU in Paged Real Mode with stale MMU context (longmode page tables),
which causes some AMD CPUs** to bail out with VMEXIT_INVALID.
Fix issue by unconditionally resetting MMU context
at init_vmcb() time.
* AMD64 Architecture Programmer’s Manual,
Volume 2: System Programming, rev: 3.25
15.19 Paged Real Mode
** Opteron 1216
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Fixes: d81135a57aa6
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove
with not-instrumented analogues __memset/__memcpy/__memove.
However, on x86 the EFI stub is not linked with the kernel. It uses
not-instrumented mem*() functions from arch/x86/boot/compressed/string.c
So we don't replace them with __mem*() variants in EFI stub.
On ARM64 the EFI stub is linked with the kernel, so we should replace
mem*() functions with __mem*(), because the EFI stub runs before KASAN
sets up early shadow.
So let's move these #undef mem* into arch's asm/efi.h which is also
included by the EFI stub.
Also, this will fix the warning in 32-bit build reported by kbuild test
robot:
efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy'
[akpm@linux-foundation.org: use 80 cols in comment]
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull KVM fixes from Paolo Bonzini:
"Mostly stable material, a lot of ARM fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
sched: access local runqueue directly in single_task_running
arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
arm64: KVM: Remove all traces of the ThumbEE registers
arm: KVM: Disable virtual timer even if the guest is not using it
arm64: KVM: Disable virtual timer even if the guest is not using it
arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
KVM: s390: Replace incorrect atomic_or with atomic_andnot
arm: KVM: Fix incorrect device to IPA mapping
arm64: KVM: Fix user access for debug registers
KVM: vmx: fix VPID is 0000H in non-root operation
KVM: add halt_attempted_poll to VCPU stats
kvm: fix zero length mmio searching
kvm: fix double free for fast mmio eventfd
kvm: factor out core eventfd assign/deassign logic
kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
KVM: make the declaration of functions within 80 characters
KVM: arm64: add workaround for Cortex-A57 erratum #852523
KVM: fix polling for guest halt continued even if disable it
arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
arm64: KVM: set {v,}TCR_EL2 RES1 bits
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Reference SDM 28.1:
The current VPID is 0000H in the following situations:
- Outside VMX operation. (This includes operation in system-management
mode under the default treatment of SMIs and SMM with VMX operation;
see Section 34.14.)
- In VMX root operation.
- In VMX non-root operation when the “enable VPID” VM-execution control
is 0.
The VPID should never be 0000H in non-root operation when "enable VPID"
VM-execution control is 1. However, commit 34a1cd60 ("kvm: x86: vmx:
move some vmx setting from vmx_init() to hardware_setup()") remove the
codes which reserve 0000H for VMX root operation.
This patch fix it by again reserving 0000H for VMX root operation.
Cc: stable@vger.kernel.org # 3.19+
Fixes: 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4
Reported-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.
For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling. This would show as an abnormally high number of
attempted polling compared to the successful polls.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"This is a rather large update post rc1 due to the final steps of
cleanups and API changes which had to wait for the preparatory patches
to hit your tree.
- Regression fixes for ARM GIC irqchips
- Regression fixes and lockdep anotations for renesas irq chips
- The leftovers of the cleanup and preparatory patches which have
been ignored by maintainers
- Final conversions of the newly merged users of obsolete APIs
- Final removal of obsolete APIs
- Final removal of ARM artifacts which had been introduced during the
conversion of ARM to the generic interrupt code.
- Final split of the irq_data into chip specific and common data to
reflect the needs of hierarchical irq domains.
- Treewide removal of the first argument of interrupt flow handlers,
i.e. the irq number, which is not used by the majority of handlers
and simple to retrieve from the other argument the irq descriptor.
- A few comment updates and build warning fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
arm64: Remove ununsed set_irq_flags
ARM: Remove ununsed set_irq_flags
sh: Kill off set_irq_flags usage
irqchip: Kill off set_irq_flags usage
gpu/drm: Kill off set_irq_flags usage
genirq: Remove irq argument from irq flow handlers
genirq: Move field 'msi_desc' from irq_data into irq_common_data
genirq: Move field 'affinity' from irq_data into irq_common_data
genirq: Move field 'handler_data' from irq_data into irq_common_data
genirq: Move field 'node' from irq_data into irq_common_data
irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag
irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag
genirq: Provide IRQD_FORWARDED_TO_VCPU status flag
genirq: Simplify irq_data_to_desc()
genirq: Remove __irq_set_handler_locked()
pinctrl/pistachio: Use irq_set_handler_locked
gpio: vf610: Use irq_set_handler_locked
powerpc/mpc8xx: Use irq_set_handler_locked()
powerpc/ipic: Use irq_set_handler_locked()
powerpc/cpm2: Use irq_set_handler_locked()
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Most interrupt flow handlers do not use the irq argument. Those few
which use it can retrieve the irq number from the irq descriptor.
Remove the argument.
Search and replace was done with coccinelle and some extra helper
scripts around it. Thanks to Julia for her help!
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Irq affinity mask is per-irq instead of per irqchip, so move it into
struct irq_common_data.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single regression fix for the x86 dma allocator which got wreckaged
in the merge window"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci/dma: Fix gfp flags for coherent DMA memory allocation
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 6894258eda2f reversed the order of gfp_flags adjustment in
dma_alloc_attrs() for x86 [arch/x86/kernel/pci-dma.c] As a result,
relevant flags set by dma_alloc_coherent_gfp_flags() are just
discarded and cause coherent DMA memory allocation failure on some
devices.
Fixes: 6894258eda2f ("dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20150914073834.GA13077@xzibit.linux.bs1.fc.nec.co.jp
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- misc fixes all around the map
- block non-root vm86(old) if mmap_min_addr != 0
- two small debuggability improvements
- removal of obsolete paravirt op
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform: Fix Geode LX timekeeping in the generic x86 build
x86/apic: Serialize LVTT and TSC_DEADLINE writes
x86/ioapic: Force affinity setting in setup_ioapic_dest()
x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
x86/ldt: Fix small LDT allocation for Xen
x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text
x86/cpu: Print family/model/stepping in hex
x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
x86/alternatives: Make optimize_nops() interrupt safe and synced
x86/mm/srat: Print non-volatile flag in SRAT
x86/cpufeatures: Enable cpuid for Intel SHA extensions
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable")
bypassed verification of the TSC on Geode LX. However, this code
(now in the check_system_tsc_reliable() function in
arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
set.
OpenWRT has recently started building its generic Geode target
for Geode GX, not LX, to include support for additional
platforms. This broke the timekeeping on LX-based devices,
because the TSC wasn't marked as reliable:
https://dev.openwrt.org/ticket/20531
By adding a runtime check on is_geode_lx(), we can also include
the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
fixing the problem.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.
The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.
xAPIC:
"1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
x2APIC:
"To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the
APIC registers. Thus, system software should not use 'WRMSR to APIC
registers in x2APIC mode' as a serializing instruction. Read and write
accesses to the APIC registers will occur in program order. A WRMSR to
an APIC register may complete before all preceding stores are globally
visible; software can prevent this by inserting a serializing
instruction, an SFENCE, or an MFENCE before the WRMSR."
The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.
Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.
[ tglx: Massaged the changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The recent ioapic cleanups changed the affinity setting in
setup_ioapic_dest() from a direct write to the hardware to the delayed
affinity setup via irq_set_affinity().
That results in a warning from chained_irq_exit():
WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
[<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
[<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
[<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
The reason is that irq_set_affinity() does not write directly to the
hardware. It marks the affinity setting as pending and executes it
from the next interrupt. The chained handler infrastructure does not
take the irq descriptor lock for performance reasons because such a
chained interrupt is not visible to any interfaces. So the delayed
affinity setting triggers the warning in irq_move_masked_irq().
Restore the old behaviour by calling the set_affinity function of the
ioapic chip in setup_ioapic_dest(). This is safe as none of the
interrupts can be on the fly at this point.
Fixes: aa5cb97f14a2 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: jarkko.nikula@linux.intel.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's not used anywhere.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1442227343-403-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While the following commit:
37868fe113 ("x86/ldt: Make modify_ldt synchronous")
added a nice comment explaining that Xen needs page-aligned
whole page chunks for guest descriptor tables, it then
nevertheless used kzalloc() on the small size path.
As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
page-aligned memory blocks, I believe this needs to be switched
back to __get_free_page() (or better get_zeroed_page()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The CONFIG_VM86 Kconfig help text is actively misleading, so fix it:
- Don't mark it 'obsolete' in the text as we'll support the ABI as long as CPUs
support it.
- Qualify the part about software emulation and mention that for some apps you
want a real vm86 mode.
- Don't scare users away from the option, instead explain what it does.
Reported-by: Stas Sergeev <stsp@list.ru>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
924e101a7ab6 ("x86/debug: Dump family, model, stepping of the
boot CPU") had its good intentions to dump the exact F/M/S as an
aid during debugging sessions but its output can be ambiguous.
Fix that:
-smpboot: CPU0: Intel Core Processor (Broadwell) (fam: 06, model: 47, stepping: 02)
+smpboot: CPU0: Intel Core Processor (Broadwell) (family: 0x6, model: 0x47, stepping: 0x2)
Also, spell out "family".
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441914927-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
vm86 exposes an interesting attack surface against the entry
code. Since vm86 is mostly useless anyway if mmap_min_addr != 0,
just turn it off in that case.
There are some reports that vbetool can work despite setting
mmap_min_addr to zero. This shouldn't break that use case,
as CAP_SYS_RAWIO already overrides mmap_min_addr.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |\ \
| | | |
| | | |
| | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Richard reported the following crash:
[ 0.036000] BUG: unable to handle kernel paging request at 55501e06
[ 0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
[ 0.036000] Call Trace:
[ 0.036000] [<c0409c80>] ? add_nops+0x90/0xa0
[ 0.036000] [<c040a054>] apply_alternatives+0x274/0x630
Chuck decoded:
" 0: 8d 90 90 83 04 24 lea 0x24048390(%eax),%edx
6: 80 fc 0f cmp $0xf,%ah
9: a8 0f test $0xf,%al
>> b: a0 06 1e 50 55 mov 0x55501e06,%al
10: 57 push %edi
11: 56 push %esi
Interrupt 0x30 occurred while the alternatives code was replacing the
initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
optimized version, 0x8d,0x76,0x00. Only the first byte has been
replaced so far, and it makes a mess out of the insn decoding."
optimize_nops() is buggy in two aspects:
- It's not disabling interrupts across the modification
- It's lacking a sync_core() call
Add both.
Fixes: 4fd4b6e5537c 'x86/alternatives: Use optimized NOPs for padding'
Reported-and-tested-by: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard W.M. Jones <rjones@redhat.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With the addition of NVDIMM support, a question came up as to
whether NVDIMM ranges should be in the SRAT with this bit set.
I think the consensus was no because the ranges are in the NFIT
with proximity domain information there.
ACPI is not clear on the meaning of this bit in the SRAT.
If someone is setting it, we might want to ask them what they
expect to happen with it.
Right now this bit is only printed if all the ACPI debug
information is turned on.
Signed-off-by: Linda Knippers <linda.knippers@hp.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150901194154.GA4939@ljkz400
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |\ \ \
| | | | |
| | | | |
| | | | | |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add Intel CPUID for Intel Secure Hash Algorithm Extensions. This feature
provides new instructions for accelerated computation of SHA-1 and SHA-256.
This allows the feature to be shown in the /proc/cpuinfo for cpus that
support it.
Refer to SHA extension programming guide in chapter 8.2 of the Intel
Architecture Instruction Set Extensions Programming reference
for definition of this feature's cpuid: CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29] = 1
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: http://lkml.kernel.org/r/1440194206.3940.6.camel@schen9-mobl2
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|