summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: Fix PDPTR reloading on CR4 writesAvi Kivity2009-05-251-1/+5
| | | | | | | | | | | | | | | | The processor is documented to reload the PDPTRs while in PAE mode if any of the CR4 bits PSE, PGE, or PAE change. Linux relies on this behaviour when zapping the low mappings of PAE kernels during boot. The code already handled changes to CR4.PAE; augment it to also notice changes to PSE and PGE. This triggered while booting an F11 PAE kernel; the futex initialization code runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem ended up uninitialized, killing PI futexes and pulseaudio which uses them. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Make paravirt tlb flush also reload the PAE PDPTRsAvi Kivity2009-05-251-2/+1
| | | | | | | | | | The paravirt tlb flush may be used not only to flush TLBs, but also to reload the four page-directory-pointer-table entries, as it is used as a replacement for reloading CR3. Change the code to do the entire CR3 reloading dance instead of simply flushing the TLB. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Remove port 80 passthroughAvi Kivity2009-05-111-1/+0
| | | | | | | | | | | | | KVM optimizes guest port 80 accesses by passthing them through to the host. Some AMD machines die on port 80 writes, allowing the guest to hard-lock the host. Remove the port passthrough to avoid the problem. Cc: stable@kernel.org Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Tested-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Make EFER reads safe when EFER does not existAvi Kivity2009-05-111-2/+2
| | | | | | | | Some processors don't have EFER; don't oops if userspace wants us to read EFER when we check NX. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix NX support reportingAvi Kivity2009-05-111-1/+1
| | | | | | NX support is bit 20, not bit 1. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Fix cross vendor migration issue with unusable bitAndre Przywara2009-05-111-2/+5
| | | | | | | | | AMDs VMCB does not have an explicit unusable segment descriptor field, so we emulate it by using "not present". This has to be setup before the fixups, because this field is used there. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Unregister cpufreq notifier on unloadJan Kiszka2009-04-221-0/+3
| | | | | | | | Properly unregister cpufreq notifier on onload if it was registered during init. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: release time_page on vcpu destructionJoerg Roedel2009-04-221-0/+5
| | | | | | | | | Not releasing the time_page causes a leak of that page or the compound page it is situated in. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: disable global page optimizationMarcelo Tosatti2009-04-221-1/+1
| | | | | | | | | Complexity to fix it not worthwhile the gains, as discussed in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* Merge branch 'tracing/core-v2' into tracing-for-linusIngo Molnar2009-04-021-1/+2
|\ | | | | | | | | | | | | | | Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c
| * Merge branch 'linus' into tracing/blktraceIngo Molnar2009-02-199-70/+33
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: block/blktrace.c Semantic merge: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ Merge branch 'linus' into tracing/kmemtrace2Ingo Molnar2009-01-0614-1072/+1059
| |\ \
| * | | tracing, kvm: change MARKERS to select instead of depends onIngo Molnar2008-12-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix fix: kernel/trace/Kconfig:42:error: found recursive dependency: TRACING -> TRACEPOINTS -> MARKERS -> KVM_TRACE -> RELAY -> KMEMTRACE -> TRACING markers is a facility that should be selected - not depended on by an interactive Kconfig entry. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | KVM: VMX: Don't allow uninhibited access to EFER on i386Avi Kivity2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmx_set_msr() does not allow i386 guests to touch EFER, but they can still do so through the default: label in the switch. If they set EFER_LME, they can oops the host. Fix by having EFER access through the normal channel (which will check for EFER_LME) even on i386. Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Fix missing smp tlb flush in invlpgAndrea Arcangeli2009-03-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kvm emulates an invlpg instruction, it can drop a shadow pte, but leaves the guest tlbs intact. This can cause memory corruption when swapping out. Without this the other cpu can still write to a freed host physical page. tlb smp flush must happen if rmap_remove is called always before mmu_lock is released because the VM will take the mmu_lock before it can finally add the page to the freelist after swapout. mmu notifier makes it safe to flush the tlb after freeing the page (otherwise it would never be safe) so we can do a single flush for multiple sptes invalidated. Cc: stable@kernel.org Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: fix sparse warnings: Should it be static?Hannes Eder2009-03-242-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Make symbols static. Fix this sparse warnings: arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static? arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static? arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static? arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static? arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static? virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: fix sparse warnings: context imbalanceHannes Eder2009-03-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Attribute function with __acquires(...) resp. __releases(...). Fix this sparse warnings: arch/x86/kvm/i8259.c:34:13: warning: context imbalance in 'pic_lock' - wrong count at exit arch/x86/kvm/i8259.c:39:13: warning: context imbalance in 'pic_unlock' - unexpected unlock Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: is_long_mode() should check for EFER.LMAAmit Shah2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_long_mode currently checks the LongModeEnable bit in EFER instead of the LongModeActive bit. This is wrong, but we survived this till now since it wasn't triggered. This breaks guests that go from long mode to compatibility mode. This is noticed on a solaris guest and fixes bug #1842160 Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* | | | KVM: VMX: Update necessary state when guest enters long modeAmit Shah2009-03-241-30/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setup_msrs() should be called when entering long mode to save the shadow state for the 64-bit guest state. Using vmx_set_efer() in enter_lmode() removes some duplicated code and also ensures we call setup_msrs(). We can safely pass the value of shadow_efer to vmx_set_efer() as no other bits in the efer change while enabling long mode (guest first sets EFER.LME, then sets CR0.PG which causes a vmexit where we activate long mode). With this fix, is_long_mode() can check for EFER.LMA set instead of EFER.LME and 5e23049e86dd298b72e206b420513dbc3a240cd9 can be reverted. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: Fix another largepage memory leakJoerg Roedel2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the paging_fetch function rmap_remove is called after setting a large pte to non-present. This causes rmap_remove to not drop the reference to the large page. The result is a memory leak of that page. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: SVM: set accessed bit for VMCB segment selectorsAndre Przywara2009-03-241-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the segment descriptor _cache_ the accessed bit is always set (although it can be cleared in the descriptor itself). Since Intel checks for this condition on a VMENTRY, set this bit in the AMD path to enable cross vendor migration. Cc: stable@kernel.org Signed-off-by: Andre Przywara <andre.przywara@amd.com> Acked-By: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Report IRQ injection status to userspace.Gleb Natapov2009-03-242-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IRQ injection status is either -1 (if there was no CPU found that should except the interrupt because IRQ was masked or ioapic was misconfigured or ...) or >= 0 in that case the number indicates to how many CPUs interrupt was injected. If the value is 0 it means that the interrupt was coalesced and probably should be reinjected. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: remove assertion in kvm_mmu_alloc_pageJoerg Roedel2009-03-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assertion no longer makes sense since we don't clear page tables on allocation; instead we clear them during prefetch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: remove redundant check in mmu_set_spteJoerg Roedel2009-03-241-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following code flow is unnecessary: if (largepage) was_rmapped = is_large_pte(*shadow_pte); else was_rmapped = 1; The is_large_pte() function will always evaluate to one here because the (largepage && !is_large_pte) case is already handled in the first if-clause. So we can remove this check and set was_rmapped to one always here. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Fix kvmclock on !constant_tsc boxesGerd Hoffmann2009-03-241-9/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvmclock currently falls apart on machines without constant tsc. This patch fixes it. Changes: * keep tsc frequency in a per-cpu variable. * handle kvmclock update using a new request flag, thus checking whenever we need an update each time we enter guest context. * use a cpufreq notifier to track frequency changes and force kvmclock updates. * send ipis to kick cpu out of guest context if needed to make sure the guest doesn't see stale values. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: VMX: Use kvm_mmu_page_fault() handle EPT violation mmioSheng Yang2009-03-241-29/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed duplicated code. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Drop unused evaluations from string pio handlersJan Kiszka2009-03-242-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looks like neither the direction nor the rep prefix are used anymore. Drop related evaluations from SVM's and VMX's I/O exit handlers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Add FFXSR supportAlexander Graf2009-03-242-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD K10 CPUs implement the FFXSR feature that gets enabled using EFER. Let's check if the virtual CPU description includes that CPUID feature bit and allow enabling it then. This is required for Windows Server 2008 in Hyper-V mode. v2 adds CPUID capability exposure Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: make irq ack notifications aware of routing tableMarcelo Tosatti2009-03-242-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IRQ ack notifications assume an identity mapping between pin->gsi, which might not be the case with, for example, HPET. Translate before acking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Gleb Natapov <gleb@redhat.com>
* | | | KVM: Userspace controlled irq routingAvi Kivity2009-03-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently KVM has a static routing from GSI numbers to interrupts (namely, 0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to the IOAPIC). This is insufficient for several reasons: - HPET requires non 1:1 mapping for the timer interrupt - MSIs need a new method to assign interrupt numbers and dispatch them - ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the ioapics This patch implements an interrupt routing table (as a linked list, but this can be easily changed) and a userspace interface to replace the table. The routing table is initialized according to the current hardwired mapping. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: x86: Fix typos and whitespace errorsAmit Shah2009-03-241-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some typos, comments, whitespace errors corrected in the cpuid code Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: Only enable cr4_pge role in shadow modeAvi Kivity2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Two dimensional paging is only confused by it. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: Rename "metaphysical" attribute to "direct"Avi Kivity2009-03-242-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually describes what is going on, rather than alerting the reader that something strange is going on. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: drop zeroing on mmu_memory_cache_allocMarcelo Tosatti2009-03-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zeroing on mmu_memory_cache_alloc is unnecessary since: - Smaller areas are pre-allocated with kmem_cache_zalloc. - Page pointed by ->spt is overwritten with prefetch_page and entries in page pointed by ->gfns are initialized before reading. [avi: zeroing pages is unnecessary] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: SVM: Fix typo in has_svm()Joe Perches2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Reset PIT irq injection logic when the PIT IRQ is unmaskedAvi Kivity2009-03-242-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the PIT is masked the guest cannot ack the irq, so the reinject logic will never allow the interrupt to be injected. Fix by resetting the reinjection counters on unmask. Unbreaks Xen. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Add CONFIG_HAVE_KVM_IRQCHIPAvi Kivity2009-03-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Two KVM archs support irqchips and two don't. Add a Kconfig item to make selecting between the two models easier. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: MMU: Optimize page unshadowingAvi Kivity2009-03-241-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Using kvm_mmu_lookup_page() will result in multiple scans of the hash chains; use hlist_for_each_entry_safe() to achieve a single scan instead. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: SVM: Add microcode patch level dummyAlexander Graf2009-03-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMware ESX checks if the microcode level is correct when using a barcelona CPU, in order to see if it actually can use SVM. Let's tell it we're on the safe side... Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Properly lock PIT creationAvi Kivity2009-03-242-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Otherwise, two threads can create a PIT in parallel and cause a memory leak. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: x86 emulator: implement 'ret far' instruction (opcode 0xcb)Avi Kivity2009-03-241-1/+25
| | | | | | | | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: VMX: When emulating on invalid vmx state, don't return to userspace ↵Avi Kivity2009-03-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unnecessarily If we aren't doing mmio there's no need to exit to userspace (which will just be confused). Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: x86 emulator: Make emulate_pop() a little more genericAvi Kivity2009-03-241-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow emulate_pop() to read into arbitrary memory rather than just the source operand. Needed for complicated instructions like far returns. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: VMX: Prevent exit handler from running if emulating due to invalid stateAvi Kivity2009-03-241-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we've just emulated an instruction, we won't have any valid exit reason and associated information. Fix by moving the clearing of the emulation_required flag to the exit handler. This way the exit handler can notice that we've been emulating and abort early. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: VMX: don't clobber segment AR if emulating invalid stateAvi Kivity2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The ususable bit is important for determining state validity; don't clobber it. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: VMX: Fix guest state validity checksAvi Kivity2009-03-241-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The vmx guest state validity checks are full of bugs. Make them conform to the manual. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: PIT: provide an option to disable interrupt reinjectionMarcelo Tosatti2009-03-243-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain clocks (such as TSC) in older 2.6 guests overaccount for lost ticks, causing severe time drift. Interrupt reinjection magnifies the problem. Provide an option to disable it. [avi: allow room for expansion in case we want to disable reinjection of other timers] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Fallback support for MSR_VM_HSAVE_PAAvi Kivity2009-03-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we advertise MSR_VM_HSAVE_PA, userspace will attempt to read it even on Intel. Implement fake support for this MSR to avoid the warnings. Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: remove the vmap usageIzik Eidus2009-03-241-49/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmap() on guest pages hides those pages from the Linux mm for an extended (userspace determined) amount of time. Get rid of it. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: introduce kvm_read_guest_virt, kvm_write_guest_virtIzik Eidus2009-03-241-14/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit change the name of emulator_read_std into kvm_read_guest_virt, and add new function name kvm_write_guest_virt that allow writing into a guest virtual address. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud