summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAgeFilesLines
* thp: mmu_notifier_test_youngAndrea Arcangeli2011-01-131-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* thp: kvm mmu transparent hugepage supportAndrea Arcangeli2011-01-132-17/+83
| | | | | | | | | | | This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2011-01-1312-859/+1486
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...
| * KVM: Initialize fpu state in preemptible contextAvi Kivity2011-01-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | init_fpu() (which is indirectly called by the fpu switching code) assumes it is in process context. Rather than makeing init_fpu() use an atomic allocation, which can cause a task to be killed, make sure the fpu is already initialized when we enter the run loop. KVM-Stable-Tag. Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: VMX: when entering real mode align segment base to 16 bytesGleb Natapov2011-01-121-1/+5
| | | | | | | | | | | | | | | | VMX checks that base is equal segment shifted 4 bits left. Otherwise guest entry fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: handle 'map_writable' in set_spte() functionXiao Guangrong2011-01-122-11/+4
| | | | | | | | | | | | | | | | | | Move the operation of 'writable' to set_spte() to clean up code [avi: remove unneeded booleanification] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: audit: allow audit more guests at the same timeXiao Guangrong2011-01-122-30/+35
| | | | | | | | | | | | | | | | | | | | | | | | It only allows to audit one guest in the system since: - 'audit_point' is a glob variable - mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled after a guest exited this patch fix those issues then allow to audit more guests at the same time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Fetch guest cr3 from hardware on demandAvi Kivity2011-01-124-6/+21
| | | | | | | | | | | | | | | | | | | | Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Replace reads of vcpu->arch.cr3 by an accessorAvi Kivity2011-01-125-20/+27
| | | | | | | | | | | | This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: only write protect mappings at pagetable levelMarcelo Tosatti2011-01-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | If a pagetable contains a writeable large spte, all of its sptes will be write protected, including non-leaf ones, leading to endless pagefaults. Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault paths assume non-leaf sptes are writable. Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()Avi Kivity2011-01-121-2/+2
| | | | | | | | | | | | | | | | 'error' is byte sized, so use a byte register constraint. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: MMU: Initialize base_role for tdp mmusAvi Kivity2011-01-121-0/+1
| | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: Optimize atomic EFER loadAvi Kivity2011-01-121-0/+30
| | | | | | | | | | | | | | | | | | When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: copy instruction bytes from VMCBAndre Przywara2011-01-125-9/+17
| | | | | | | | | | | | | | | | | | | | | | In case of a nested page fault or an intercepted #PF newer SVM implementations provide a copy of the faulting instruction bytes in the VMCB. Use these bytes to feed the instruction emulator and avoid the costly guest instruction fetch in this case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: implement enhanced INVLPG interceptAndre Przywara2011-01-121-1/+6
| | | | | | | | | | | | | | | | | | | | When the DecodeAssist feature is available, the linear address is provided in the VMCB on INVLPG intercepts. Use it directly to avoid any decoding and emulation. This is only useful for shadow paging, though. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: enhance mov DR intercept handlerAndre Przywara2011-01-121-16/+40
| | | | | | | | | | | | | | | | | | | | Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle debug register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: enhance MOV CR intercept handlerAndre Przywara2011-01-121-11/+79
| | | | | | | | | | | | | | | | | | | | Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle CR register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: add new SVM feature bit namesAndre Przywara2011-01-121-0/+4
| | | | | | | | | | | | | | | | the recent APM Vol.2 and the recent AMD CPUID specification describe new CPUID features bits for SVM. Name them here for later usage. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: cleanup emulate_instructionAndre Przywara2011-01-124-20/+19
| | | | | | | | | | | | | | | | | | | | emulate_instruction had many callers, but only one used all parameters. One parameter was unused, another one is now hidden by a wrapper function (required for a future addition anyway), so most callers use now a shorter parameter list. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: move complete_insn_gp() into x86.cAndre Przywara2011-01-122-12/+13
| | | | | | | | | | | | | | | | move the complete_insn_gp() helper function out of the VMX part into the generic x86 part to make it usable by SVM. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: fix CR8 handlingAndre Przywara2011-01-123-15/+14
| | | | | | | | | | | | | | | | | | | | The handling of CR8 writes in KVM is currently somewhat cumbersome. This patch makes it look like the other CR register handlers and fixes a possible issue in VMX, where the RIP would be incremented despite an injected #GP. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Take missing slots_lock for kvm_io_bus_unregister_dev()Takuya Yoshikawa2011-01-121-0/+4
| | | | | | | | | | | | | | | | In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking slots_lock in the error handling path. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: return true when user space query KVM_CAP_USER_NMI extensionLai Jiangshan2011-01-121-0/+1
| | | | | | | | | | | | | | userspace may check this extension in runtime. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Correct kvm_pio tracepoint count fieldAvi Kivity2011-01-121-2/+2
| | | | | | | | | | | | Currently, we record '1' for count regardless of the real count. Fix. Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: Fix incorrect direct page write protection due to ro host pageAvi Kivity2011-01-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If KVM sees a read-only host page, it will map it as read-only to prevent breaking a COW. However, if the page was part of a large guest page, KVM incorrectly extends the write protection to the entire large page frame instead of limiting it to the normal host page. This results in the instantiation of a new shadow page with read-only access. If this happens for a MOVS instruction that moves memory between two normal pages, within a single large page frame, and mapped within the guest as a large page, and if, in addition, the source operand is not writeable in the host (perhaps due to KSM), then KVM will instantiate a read-only direct shadow page, instantiate an spte for the source operand, then instantiate a new read/write direct shadow page and instantiate an spte for the destination operand. Since these two sptes are in different shadow pages, MOVS will never see them at the same time and the guest will not make progress. Fix by mapping the direct shadow page read/write, and only marking the host page read-only. Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add xsetbv interceptJoerg Roedel2011-01-121-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the xsetbv intercept to the AMD part of KVM. This makes AVX usable in a save way for the guest on AVX capable AMD hardware. The patch is tested by using AVX in the guest and host in parallel and checking for data corruption. I also used the KVM xsave unit-tests and they all pass. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: Make the way of accessing lpage_info more genericTakuya Yoshikawa2011-01-121-29/+25
| | | | | | | | | | | | | | | | | | | | | | | | Large page information has two elements but one of them, write_count, alone is accessed by a helper function. This patch replaces this helper function with more generic one which returns newly named kvm_lpage_info structure and use it to access the other element rmap_pde. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)Anthony Liguori2011-01-121-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Implement Flush-By-Asid featureJoerg Roedel2011-01-121-2/+8
| | | | | | | | | | | | | | | | This patch adds the new flush-by-asid of upcoming AMD processors to the KVM-AMD module. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Use svm_flush_tlb instead of force_new_asidJoerg Roedel2011-01-121-12/+7
| | | | | | | | | | | | | | | | | | | | This patch replaces all calls to force_new_asid which are intended to flush the guest-tlb by the more appropriate function svm_flush_tlb. As a side-effect the force_new_asid function is removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Remove flush_guest_tlb functionJoerg Roedel2011-01-121-5/+0
| | | | | | | | | | | | | | | | This function is unused and there is svm_flush_tlb which does the same. So this function can be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: retry #PF for softmmuXiao Guangrong2011-01-123-6/+16
| | | | | | | | | | | | | | | | Retry #PF for softmmu only when the current vcpu has the same cr3 as the time when #PF occurs Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: fix accessed bit set on prefault pathXiao Guangrong2011-01-121-4/+6
| | | | | | | | | | | | | | Retry #PF is the speculative path, so don't set the accessed bit Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: rename 'no_apf' to 'prefault'Xiao Guangrong2011-01-122-11/+11
| | | | | | | | | | | | | | | | It's the speculative path if 'no_apf = 1' and we will specially handle this speculative path in the later patch, so 'prefault' is better to fit the sense. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for LBR stateJoerg Roedel2011-01-121-0/+2
| | | | | | | | | | | | | | | | | | This patch implements the clean-bit for all LBR related state. This includes the debugctl, br_from, br_to, last_excp_from, and last_excp_to msrs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for CR2 registerJoerg Roedel2011-01-121-2/+3
| | | | | | | | | | | | | | | | This patch implements the clean-bit for the cr2 register in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for Segements and CPLJoerg Roedel2011-01-121-0/+2
| | | | | | | | | | | | | | | | This patch implements the clean-bit defined for the cs, ds, ss, an es segemnts and the current cpl saved in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for GDT and IDTJoerg Roedel2011-01-121-0/+3
| | | | | | | | | | | | | | | | This patch implements the clean-bit for the base and limit of the gdt and idt in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for DR6 and DR7Joerg Roedel2011-01-121-0/+4
| | | | | | | | | | | | | | | | This patch implements the clean-bit for the dr6 and dr7 debug registers in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for control registersJoerg Roedel2011-01-121-0/+7
| | | | | | | | | | | | | | | | This patch implements the CRx clean-bit for the vmcb. This bit covers cr0, cr3, cr4, and efer. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for NPT stateJoerg Roedel2011-01-121-0/+3
| | | | | | | | | | | | | | | | This patch implements the clean-bit for all nested paging related state in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for interrupt stateJoerg Roedel2011-01-121-1/+7
| | | | | | | | | | | | | | | | | | This patch implements the clean-bit for all interrupt related state in the vmcb. This corresponds to vmcb offset 0x60-0x67. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for the ASIDJoerg Roedel2011-01-121-0/+3
| | | | | | | | | | | | | | | | This patch implements the clean-bit for the asid in the vmcb. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for IOPM_BASE and MSRPM_BASEJoerg Roedel2011-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | This patch adds the clean bit for the physical addresses of the MSRPM and the IOPM. It does not need to be set in the code because the only place where these values are changed is the nested-svm vmrun and vmexit path. These functions already mark the complete VMCB as dirty. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bit for intercetps, tsc-offset and pause filter countJoerg Roedel2011-01-121-0/+7
| | | | | | | | | | | | | | | | | | | | This patch adds the clean-bit for intercepts-vectors, the TSC offset and the pause-filter count to the appropriate places. The IO and MSR permission bitmaps are not subject to this bit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Add clean-bits infrastructure codeRoedel, Joerg2011-01-121-0/+31
| | | | | | | | | | | | | | | | This patch adds the infrastructure for the implementation of the individual clean-bits. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: Avoid dropping accessed bit while removing write accessTakuya Yoshikawa2011-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | One more "KVM: MMU: Don't drop accessed bit while updating an spte." Sptes are accessed by both kvm and hardware. This patch uses update_spte() to fix the way of removing write access. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: VMX: Return 0 from a failed VMREADAvi Kivity2011-01-121-2/+2
| | | | | | | | | | | | | | | | | | If we execute VMREAD during reboot we'll just skip over it. Instead of returning garbage, return 0, which has a much smaller chance of confusing the code. Otherwise we risk a flood of debug printk()s which block the reboot process if a serial console or netconsole is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: SVM: Use get_host_vmcb function in svm_get_msr for TSCJoerg Roedel2011-01-121-7/+2
| | | | | | | | | | | | | | | | | | This patch replaces the open-coded vmcb-selection for the TSC calculation with the new get_host_vmcb helper function introduced in this patchset. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: SVM: Add manipulation functions for misc interceptsJoerg Roedel2011-01-121-33/+51
| | | | | | | | | | | | | | | | | | This patch wraps changes to the misc intercepts of SVM into seperate functions to abstract nested-svm better and prepare the implementation of the vmcb-clean-bits feature. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
OpenPOWER on IntegriCloud