| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK
bit set in the case of SMEP fault. The code updated 'eperm' after
the variable was checked.
Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded.
On SVM, it is not possible to implement this, but on VMX this is possible
and was indeed implemented until nested SVM changed this to unconditionally
read PDPTEs dynamically. This has noticable impact when running PAE guests.
Fix by changing the MMU to read PDPTRs from the cache, falling back to
reading from memory for the nested MMU.
Signed-off-by: Avi Kivity <avi@redhat.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use BUG_ON(x) rather than if(x) BUG();
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@ identifier x; @@
-if (x) BUG();
+BUG_ON(x);
@@ identifier x; @@
-if (!x) BUG();
+BUG_ON(!x);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Windows Server 2008 SP2 checked build with smp > 1 BSOD's during
boot due to lack of microcode update:
*** Assertion failed: The system BIOS on this machine does not properly
support the processor. The system BIOS did not load any microcode update.
A BIOS containing the latest microcode update is needed for system reliability.
(CurrentUpdateRevision != 0)
*** Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440
Report a non-zero microcode update signature to make it happy.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Return EMULATION_OK/FAILED consistently. Also treat instruction fetch
errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED;
although this cannot happen in practice, the current logic will continue
the emulation even if the decoder fails to fetch the instruction.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
| |
Fetching the instruction which was to be executed by the guest cannot
fail normally. So compiler should always predict that it will succeed.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
| |
_type is enough to know the size.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing ctxt->_eip from insn_fetch() call sites, get it from
ctxt in do_insn_fetch_byte(). This is done by replacing the argument
_eip of insn_fetch() with _ctxt, which should be better than letting the
macro use ctxt silently in its body.
Though this changes the place where ctxt->_eip is incremented from
insn_fetch() to do_insn_fetch_byte(), this does not have any real
effect.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.
Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.
Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.
Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The vmexit tracepoints format the exit_reason to make it human-readable.
Since the exit_reason depends on the instruction set (vmx or svm),
formatting is handled with ftrace_print_symbols_seq() by referring to
the appropriate exit reason table.
However, the ftrace_print_symbols_seq() function is not meant to be used
directly in tracepoints since it does not export the formatting table
which userspace tools like trace-cmd and perf use to format traces.
In practice perf dies when formatting vmexit-related events and
trace-cmd falls back to printing the numeric value (with extra
formatting code in the kvm plugin to paper over this limitation). Other
userspace consumers of vmexit-related tracepoints would be in similar
trouble.
To avoid significant changes to the kvm_exit tracepoint, this patch
moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and
selects the right table with __print_symbolic() depending on the
instruction set. Note that __print_symbolic() is designed for exporting
the formatting table to userspace and allows trace-cmd and perf to work.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The kvm_exit tracepoint recently added the isa argument to aid decoding
exit_reason. The semantics of exit_reason depend on the instruction set
(vmx or svm) and the isa argument allows traces to be analyzed on other
machines.
Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject
so these tracepoints can also be self-describing.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Commit 0945d4b228 tried to fix the get_msr path for the
HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested. We should be
returning 0 if the read succeeded, and passing the value back to the
caller via the pdata out argument, not returning the value directly.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even
though it is explicitly enumerated as something the vmm should save in
msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST
ioctl.
Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE. We simply return the
guest visible value of this register, which seems to be correct as a set
on the register is validated for us already.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The patch raises the hard limit of VCPU count to 254.
This will allow developers to easily work on scalability
and will allow users to test high VCPU setups easily without
patching the kernel.
To prevent possible issues with current setups, KVM_CAP_NR_VCPUS
now returns the recommended VCPU limit (which is still 64) - this
should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS
returns the hard limit which is now 254.
Cc: Avi Kivity <avi@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
| |
Using the read/write operation to remove the same code
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The operations of read emulation and write emulation are very similar, so we
can abstract the operation of them, in larter patch, it is used to cleanup the
same code
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If the range spans a page boundary, the mmio access can be broke, fix it as
write emulation.
And we already get the guest physical address, so use it to read guest data
directly to avoid walking guest page table again
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Src2CL decode (used for double width shifts) erronously decodes only bit 3
of %rcx, instead of bits 7:0.
Fix by decoding %cl in its entirety.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
__update_clear_spte_slow should return original spte while the
current code returns low half of original spte combined with high
half of new spte.
Signed-off-by: Zhao Jin <cronozhj@gmail.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| | |
* 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen:
xen/i386: follow-up to "replace order-based range checking of M2P table by linear one"
xen/irq: Alter the locking to use a mutex instead of a spinlock.
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
xen: disable PV spinlocks on HVM
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
linear one"
The numbers obtained from the hypervisor really can't ever lead to an
overflow here, only the original calculation going through the order
of the range could have. This avoids the (as Jeremy points outs)
somewhat ugly NULL-based calculation here.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:
reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:
return min(max_pages, MAX_DOMAIN_PAGES);
and use it:
extra_limit = xen_get_max_pages();
if (extra_limit >= max_pfn)
extra_pages = extra_limit - max_pfn;
else
extra_pages = 0;
which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:
Xen: 00000000ff000000 - 0000000100000000 (reserved)
Xen: 0000000100000000 - 000000133f2e2000 (usable)
which is clearly wrong. It should look as so:
Xen: 00000000ff000000 - 0000000100000000 (reserved)
Xen: 0000000100000000 - 000000027fbda000 (usable)
Naturally this problem does not present itself if dom0_mem=max:X
is used.
CC: stable@kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
PV spinlocks cannot possibly work with the current code because they are
enabled after pvops patching has already been done, and because PV
spinlocks use a different data structure than native spinlocks so we
cannot switch between them dynamically. A spinlock that has been taken
once by the native code (__ticket_spin_lock) cannot be taken by
__xen_spin_lock even after it has been released.
Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On x86-64, they were just wasteful: with the explicitly added (now
unnecessary) padding, the size of the alternatives structure was 16
bytes, and an alignment of 8 bytes didn't hurt much.
However, it was still silly, since the natural size and alignment for
the structure is actually just 12 bytes, 4-byte aligned since commit
59e97e4d6fbc ("x86: Make alternative instruction pointers relative").
So removing the padding, and removing the extra alignment is just a good
idea.
On x86-32, the alignment of 4 bytes was correct, but was incorrectly
hardcoded as 8 bytes in <asm/alternative-asm.h>. That header file had
used to be an x86-64 only header file, but various unification efforts
have made it be used for x86-32 too (ie the unification of rwlock and
rwsem).
That in turn caused x86-32 boot failures, because the extra alignment
would result in random zero-filled words in the altinstructions section,
causing oopses early at boot when doing alternative instruction
replacement.
So just remove all the alignment noise entirely. It's wrong, and it's
unnecessary. The section itself is already properly aligned by the
linker scripts, and all additions to the section had better be of the
proper 12-byte format, keeping it aligned. So if the align directive
were to ever make a difference, that would be an indication of a serious
bug to begin with.
Reported-by: Werner Landgraf <w.landgraf@ru.r>
Acked-by: Andrew Lutomirski <luto@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | | |
* 'upstream/bugfix' of git://github.com/jsgf/linux-xen:
xen: use non-tracing preempt in xen_clocksource_read()
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The tracing code used sched_clock() to get tracing timestamps, which
ends up calling xen_clocksource_read(). xen_clocksource_read() must
disable preemption, but if preemption tracing is enabled, this results
in infinite recursion.
I've only noticed this when boot-time tracing tests are enabled, but it
seems like a generic bug. It looks like it would also affect
kvm_clocksource_read().
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit b03e7495a862 ("PCI: Set PCI-E Max Payload Size on fabric")
introduced a potential NULL pointer dereference in calls to
pcie_bus_configure_settings due to attempts to access pci_bus self
variables when the self pointer is NULL.
To correct this, verify that the self pointer in pci_bus is non-NULL
before dereferencing it.
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86, perf: Check that current->mm is alive before getting user callchain
perf_event: Fix broken calc_timer_values()
perf events: Fix slow and broken cgroup context switch code
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
An event may occur when an mm is already released.
I added an event in dequeue_entity() and caught a panic with
the following backtrace:
[ 434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120
...
[ 434.421258] Call Trace:
[ 434.421258] [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0
[ 434.421258] [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90
[ 434.421258] [<ffffffff8101b048>] perf_callchain_user+0x128/0x170
[ 434.421258] [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100
[ 434.421258] [<ffffffff81116690>] perf_prepare_sample+0x200/0x280
[ 434.421258] [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290
[ 434.421258] [<ffffffff81065240>] ? tg_shares_up+0x0/0x670
[ 434.421258] [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0
[ 434.421258] [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0
[ 434.421258] [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250
[ 434.421258] [<ffffffff81119204>] perf_tp_event+0x44/0x70
[ 434.421258] [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110
[ 434.421258] [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0
[ 434.421258] [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60
[ 434.421258] [<ffffffff8105818a>] dequeue_task+0x9a/0xb0
[ 434.421258] [<ffffffff810581e2>] deactivate_task+0x42/0xe0
[ 434.421258] [<ffffffff814bc019>] thread_return+0x191/0x808
[ 434.421258] [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60
[ 434.421258] [<ffffffff8106f4c4>] do_exit+0x464/0x910
[ 434.421258] [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0
[ 434.421258] [<ffffffff8106fa57>] sys_exit_group+0x17/0x20
[ 434.421258] [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \ \
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | | |
* 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen:
xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.
xen: x86_32: do not enable iterrupts when returning from exception in interrupt context
xen: use maximum reservation to limit amount of usable RAM
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We have hit a couple of customer bugs where they would like to
use those parameters to run an UP kernel - but both of those
options turn of important sources of interrupt information so
we end up not being able to boot. The correct way is to
pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and
the kernel will patch itself to be a UP kernel.
Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308
CC: stable@kernel.org
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
interrupt context
If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):
cmpw $0x0001, XEN_vcpu_info_pending(%eax)
sete XEN_vcpu_info_mask(%eax)
will enable interrupts even if they has been previously disabled according to
eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99)
testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp)
setz XEN_vcpu_info_mask(%eax)
Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.
Reproducer for bug is attached to RHBZ 707552
CC: stable@kernel.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Use the domain's maximum reservation to limit the amount of extra RAM
for the memory balloon. This reduces the size of the pages tables and
the amount of reserved low memory (which defaults to about 1/32 of the
total RAM).
On a system with 8 GiB of RAM with the domain limited to 1 GiB the
kernel reports:
Before:
Memory: 627792k/4472000k available
After:
Memory: 549740k/11132224k available
A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved
low memory is also reduced from 253 MiB to 32 MiB. The total
additional usable RAM is 329 MiB.
For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
the number of pages for dom0') (c/s 23790)
CC: stable@kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \ \
| |_|/ /
|/| | |
| | | |
| | | | |
* 'kvm-updates/3.1' of git://github.com/avikivity/kvm:
KVM: Fix instruction size issue in pvclock scaling
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit de2d1a524e94 ("KVM: Fix register corruption in pvclock_scale_delta")
introduced a mul instruction that may have only a memory operand; the
assembler therefore cannot select the correct size:
pvclock.s:229: Error: no instruction mnemonic suffix given and no register
operands; can't size instruction
In this example the assembler is:
#APP
mul -48(%rbp) ; shrd $32, %rdx, %rax
#NO_APP
A simple solution is to use mulq.
Signed-off-by: Duncan Sands <baldrick@free.fr>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
According to the SFI specification irq number 0xFF means device has no
interrupt or interrupt attached via GPIO.
Currently, we don't handle this special case and set irq field in
*_board_info structs to 255. It leads to confusion in some drivers.
Accelerometer driver tries to register interrupt 255, fails and prints
"Cannot get IRQ" to dmesg.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
entry_32.S contained a hardcoded alternative instruction entry, and the
format changed in commit 59e97e4d6fbc ("x86: Make alternative
instruction pointers relative").
Replace the hardcoded entry with the altinstruction_entry macro. This
fixes the 32-bit boot with CONFIG_X86_INVD_BUG=y.
Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While removing custom rendezvous code and switching to stop_machine,
commit 192d8857427d ("x86, mtrr: use stop_machine APIs for doing MTRR
rendezvous") completely dropped mtrr setting code on !CONFIG_SMP
breaking MTRR settting on UP.
Fix it by removing the incorrect CONFIG_SMP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Anders Eriksson <aeriksson@fastmail.fm>
Tested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, vdso: On system call restart after SYSENTER, use int $0x80
x86, UV: Remove UV delay in starting slave cpus
x86, olpc: Wait for last byte of EC command to be accepted
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle
the arguments to match the int $0x80 calling convention. This was
probably a design mistake, but it's what it is now. This causes
errors if the system call as to be restarted.
For SYSENTER, we have to invoke the instruction from the vdso as the
return address is hardcoded. Accordingly, we can simply replace the
jump in the vdso with an int $0x80 instruction and use the slower
entry point for a post-restart.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com
Cc: <stable@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Delete the 10 msec delay between the INIT and SIPI when starting
slave cpus. I can find no requirement for this delay. BIOS also
has similar code sequences without the delay.
Removing the delay reduces boot time by 40 sec. Every bit helps.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20110805140900.GA6774@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When executing EC commands, only waiting when there are still
more bytes to write is usually fine. However, if the system
suspends very quickly after a call to olpc_ec_cmd(), the last
data byte may not yet be transferred to the EC, and the command
will not complete.
This solves a bug where the SCI wakeup mask was not correctly
written when going into suspend.
It means that sometimes, on XO-1.5 (but not XO-1), the
devices that were marked as wakeup sources can't wake up
the system. e.g. you ask for wifi wakeups, suspend, but then
incoming wifi frames don't wake up the system as they should.
Signed-off-by: Paul Fox <pgf@laptop.org>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/tracing: Fix tracing config option properly
xen: Do not enable PV IPIs when vector callback not present
xen/x86: replace order-based range checking of M2P table by linear one
xen: xen-selfballoon.c needs more header files
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Steven Rostedt says we should use CONFIG_EVENT_TRACING.
Cc:Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix regression for HVM case on older (<4.1.1) hypervisors caused by
commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date: Thu Dec 2 17:55:10 2010 +0000
xen: PV on HVM: support PV spinlocks and IPIs
This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.
BugLink: http://bugs.launchpad.net/bugs/791850
CC: stable@kernel.org
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The order-based approach is not only less efficient (requiring a shift
and a compare, typical generated code looking like this
mov eax, [machine_to_phys_order]
mov ecx, eax
shr ebx, cl
test ebx, ebx
jnz ...
whereas a direct check requires just a compare, like in
cmp ebx, [machine_to_phys_nr]
jae ...
), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).
Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.
CC: stable@kernel.org
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: OF: Don't crash when bridge parent is NULL.
PCI: export pcie_bus_configure_settings symbol
PCI: code and comments cleanup
PCI: make cardbus-bridge resources optional
PCI: make SRIOV resources optional
PCI : ability to relocate assigned pci-resources
PCI: honor child buses add_size in hot plug configuration
PCI: Set PCI-E Max Payload Size on fabric
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
On a given PCI-E fabric, each device, bridge, and root port can have a
different PCI-E maximum payload size. There is a sizable performance
boost for having the largest possible maximum payload size on each PCI-E
device. However, if improperly configured, fatal bus errors can occur.
Thus, it is important to ensure that PCI-E payloads sends by a device
are never larger than the MPS setting of all devices on the way to the
destination.
This can be achieved two ways:
- A conservative approach is to use the smallest common denominator of
the entire tree below a root complex for every device on that fabric.
This means for example that having a 128 bytes MPS USB controller on one
leg of a switch will dramatically reduce performances of a video card or
10GE adapter on another leg of that same switch.
It also means that any hierarchy supporting hotplug slots (including
expresscard or thunderbolt I suppose, dbl check that) will have to be
entirely clamped to 128 bytes since we cannot predict what will be
plugged into those slots, and we cannot change the MPS on a "live"
system.
- A more optimal way is possible, if it falls within a couple of
constraints:
* The top-level host bridge will never generate packets larger than the
smallest TLP (or if it can be controlled independently from its MPS at
least)
* The device will never generate packets larger than MPS (which can be
configured via MRRS)
* No support of direct PCI-E <-> PCI-E transfers between devices without
some additional code to specifically deal with that case
Then we can use an approach that basically ignores downstream requests
and focuses exclusively on upstream requests. In that case, all we need
to care about is that a device MPS is no larger than its parent MPS,
which allows us to keep all switches/bridges to the max MPS supported by
their parent and eventually the PHB.
In this case, your USB controller would no longer "starve" your 10GE
Ethernet and your hotplug slots won't affect your global MPS.
Additionally, the hotplugged devices themselves can be configured to a
larger MPS up to the value configured in the hotplug bridge.
To choose between the two available options, two PCI kernel boot args
have been added to the PCI calls. "pcie_bus_safe" will provide the
former behavior, while "pcie_bus_perf" will perform the latter behavior.
By default, the latter behavior is used.
NOTE: due to the location of the enablement, each arch will need to add
calls to this function. This patch only enables x86.
This patch includes a number of changes recommended by Benjamin
Herrenschmidt.
Tested-by: Jordan_Hargrave@dell.com
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: uses TASKSTATS, depends on NET
KVM: fix TASK_DELAY_ACCT kconfig warning
|