summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86: use dynamic percpu allocations for shared msrs areaMarcelo Tosatti2013-01-081-6/+18
| | | | | | | | Use dynamic percpu allocations for the shared msrs structure, to avoid using the limited reserved percpu space. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2012-12-1311-253/+809
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Marcelo Tosatti: "Considerable KVM/PPC work, x86 kvmclock vsyscall support, IA32_TSC_ADJUST MSR emulation, amongst others." Fix up trivial conflict in kernel/sched/core.c due to cross-cpu migration notifier added next to rq migration call-back. * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits) KVM: emulator: fix real mode segment checks in address linearization VMX: remove unneeded enable_unrestricted_guest check KVM: VMX: fix DPL during entry to protected mode x86/kexec: crash_vmclear_local_vmcss needs __rcu kvm: Fix irqfd resampler list walk KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary KVM: MMU: optimize for set_spte KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code ...
| * KVM: emulator: fix real mode segment checks in address linearizationGleb Natapov2012-12-111-2/+3
| | | | | | | | | | | | | | In real mode CS register is writable, so do not #GP on write. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * VMX: remove unneeded enable_unrestricted_guest checkGleb Natapov2012-12-111-1/+1
| | | | | | | | | | | | | | | | If enable_unrestricted_guest is true vmx->rmode.vm86_active will always be false. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: fix DPL during entry to protected modeGleb Natapov2012-12-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | On CPUs without support for unrestricted guests DPL cannot be smaller than RPL for data segments during guest entry, but this state can occurs if a data segment selector changes while vcpu is in real mode to a value with lowest two bits != 00. Fix that by forcing DPL == RPL on transition to protected mode. This is a regression introduced by c865c43de66dc97. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdumpZhang Yanfei2012-12-061-0/+67
| | | | | | | | | | | | | | | | | | | | | | The vmclear function will be assigned to the callback function pointer when loading kvm-intel module. And the bitmap indicates whether we should do VMCLEAR operation in kdump. The bits in the bitmap are set/unset according to different conditions. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: MMU: optimize for set_spteXiao Guangrong2012-12-061-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: x86: Make register state after reset conform to specificationJulian Stecklina2012-12-054-16/+17
| | | | | | | | | | | | | | | | | | VMX behaves now as SVM wrt to FPU initialization. Code has been moved to generic code path. General-purpose registers are now cleared on reset and INIT. SVM code properly initializes EDX. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * kvm: don't use bit24 for detecting address-specific invalidation capabilityZhang Xiantao2012-12-051-16/+0
| | | | | | | | | | | | | | | | Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * kvm: remove unnecessary bit checking for ept violationZhang Xiantao2012-12-051-5/+0
| | | | | | | | | | | | | | Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: x86: Fix uninitialized return codeJan Kiszka2012-12-021-0/+1
| | | | | | | | | | | | | | This is a regression caused by 18595411a7. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: x86: Emulate IA32_TSC_ADJUST MSRWill Auld2012-11-305-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: Add code to track call origin for msr assignmentWill Auld2012-11-304-18/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: fix memory order between loading vmcs and clearing vmcsXiao Guangrong2012-11-291-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs does not exist on any vcpu If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu list. The list can be corrupted if the cpu prefetch the vmcs's list before reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before making vmcs->vcpu == -1 be visible Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: VMX: fix invalid cpu passed to smp_call_function_singleXiao Guangrong2012-11-281-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to smp_call_function_single, if the target cpu is downing (doing cpu hot remove), loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called in the preemptionable context Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: update pvclock area conditionally, on cpu migrationMarcelo Tosatti2012-11-271-1/+6
| | | | | | | | | | | | | | | | | | | | As requested by Glauber, do not update kvmclock area on vcpu->pcpu migration, in case the host has stable TSC. This is to reduce cacheline bouncing. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: require matched TSC offsets for master clockMarcelo Tosatti2012-11-272-8/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With master clock, a pvclock clock read calculates: ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ] Where 'rdtsc' is the host TSC. system_timestamp and tsc_timestamp are unique, one tuple per VM: the "master clock". Given a host with synchronized TSCs, its obvious that guest TSC must be matched for the above to guarantee monotonicity. Allow master clock usage only if guest TSCs are synchronized. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initializationMarcelo Tosatti2012-11-273-3/+13
| | | | | | | | | | | | TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flagMarcelo Tosatti2012-11-272-8/+257
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(&timespec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: notifier for clocksource changesMarcelo Tosatti2012-11-271-1/+93
| | | | | | | | | | | | | | | | Register a notifier for clocksource change event. In case the host switches to clock other than TSC, disable master clock usage. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: pass host_tsc to read_l1_tscMarcelo Tosatti2012-11-274-8/+8
| | | | | | | | | | | | Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: retain pvclock guest stopped bit in guest memoryMarcelo Tosatti2012-11-271-7/+13
| | | | | | | | | | | | | | | | | | | | | | Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU migration) to clear the bit. Noticed by Paolo Bonzini. Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: remove unnecessary return value checkGuo Chao2012-11-131-32/+0
| | | | | | | | | | | | | | No need to check return value before breaking switch. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: x86: fix return value of kvm_vm_ioctl_set_tss_addr()Guo Chao2012-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return value of this function will be that of ioctl(). #include <stdio.h> #include <linux/kvm.h> int main () { int fd; fd = open ("/dev/kvm", 0); fd = ioctl (fd, KVM_CREATE_VM, 0); ioctl (fd, KVM_SET_TSS_ADDR, 0xfffff000); perror (""); return 0; } Output is "Operation not permitted". That's not what we want. Return -EINVAL in this case. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: do not kfree error pointerGuo Chao2012-11-131-13/+6
| | | | | | | | | | | | | | | | We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and kvm_arch_vcpu_ioctl(). Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: do not treat noslot pfn as a error pfnXiao Guangrong2012-10-293-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * Merge remote-tracking branch 'master' into queueMarcelo Tosatti2012-10-292-3/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: development work has dependency on kvm patches merged upstream. Conflicts: arch/powerpc/include/asm/Kbuild arch/powerpc/include/asm/kvm_para.h Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: Take kvm instead of vcpu to mmu_notifier_retryChristoffer Dall2012-10-232-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: SVM: Cleanup error statementsBorislav Petkov2012-10-221-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use __func__ instead of the function name in svm_hardware_enable since those things tend to get out of sync. This also slims down printk line length in conjunction with using pr_err. No functionality change. Cc: Joerg Roedel <joro@8bytes.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: VMX: report internal error for MMIO #PF due to delivery eventXiao Guangrong2012-10-181-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | The #PF with PFEC.RSV = 1 indicates that the guest is accessing MMIO, we can not fix it if it is caused by delivery event. Reporting internal error for this case Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: VMX: report internal error for the unhandleable eventXiao Guangrong2012-10-181-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | VM exits during Event Delivery is really unexpected if it is not caused by Exceptions/EPT-VIOLATION/TASK_SWITCH, we'd better to report an internal and freeze the guest, the VMM has the chance to check the guest Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: do not de-cache cr4 bits needlesslyGleb Natapov2012-10-181-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: MMU: introduce FNAME(prefetch_gpte)Xiao Guangrong2012-10-171-31/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | The only difference between FNAME(update_pte) and FNAME(pte_prefetch) is that the former is allowed to prefetch gfn from dirty logged slot, so introduce a common function to prefetch spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: MMU: move prefetch_invalid_gpte out of pagaing_tmp.hXiao Guangrong2012-10-172-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | The function does not depend on guest mmu mode, move it out from paging_tmpl.h Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: MMU: cleanup FNAME(page_fault)Xiao Guangrong2012-10-171-19/+13
| | | | | | | | | | | | | | | | | | | | | Let it return emulate state instead of spte like __direct_map Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: MMU: remove mmu_is_invalidXiao Guangrong2012-10-172-7/+2
| | | | | | | | | | | | | | | | | | | | | Remove mmu_is_invalid and use is_invalid_pfn instead Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: x86: Make emulator_fix_hypercall staticJan Kiszka2012-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | No users outside of kvm/x86.c. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: x86: Convert kvm_arch_vcpu_reset into private kvm_vcpu_resetJan Kiszka2012-10-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | There are no external callers of this function as there is no concept of resetting a vcpu from generic code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | x86, kvm: Remove incorrect redundant assembly constraintH. Peter Anvin2012-11-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output constraints, and *then* use "a" and "d" as input constraints. This is incorrect, but happens to work on some versions of gcc. However, it breaks gcc with -O0 and icc, and may break on future versions of gcc. Reported-and-tested-by: Melanie Blower <melanie.blower@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()Takashi Iwai2012-11-161-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit [ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL, and this triggers kernel warnings like below on old CPUs: vmwrite error: reg 401e value a0568000 (err 12) Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154 Call Trace: [<ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel] [<ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel] [<ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel] [<ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm] [<ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm] [<ffffffff81143f7c>] ? __vunmap+0x9c/0x110 [<ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel] [<ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm] [<ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm] [<ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm] [<ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm] [<ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530 [<ffffffff81139d76>] ? remove_vma+0x56/0x60 [<ffffffff8113b708>] ? do_munmap+0x328/0x400 [<ffffffff81187c8c>] ? fget_light+0x4c/0x100 [<ffffffff8117e1a1>] sys_ioctl+0x91/0xb0 [<ffffffff815a942d>] system_call_fastpath+0x1a/0x1f This patch adds a check for the availability of secondary exec control to avoid these warnings. Cc: <stable@vger.kernel.org> [v3.6+] Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461)Petr Matousek2012-11-122-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On hosts without the XSAVE support unprivileged local user can trigger oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN ioctl. invalid opcode: 0000 [#2] SMP Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables ... Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0 EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0 task.ti=d7c62000) Stack: 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80 Call Trace: [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm] ... [<c12bfb44>] ? syscall_call+0x7/0xb Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74 1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01 d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89 EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP 0068:d7c63e70 QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID and then sets them later. So guest's X86_FEATURE_XSAVE should be masked out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with X86_FEATURE_XSAVE even on hosts that do not support it, might be susceptible to this attack from inside the guest as well. Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | KVM: x86: fix vcpu->mmio_fragments overflowXiao Guangrong2012-10-311-26/+34
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit b3356bf0dbb349 (KVM: emulator: optimize "rep ins" handling), the pieces of io data can be collected and write them to the guest memory or MMIO together Unfortunately, kvm splits the mmio access into 8 bytes and store them to vcpu->mmio_fragments. If the guest uses "rep ins" to move large data, it will cause vcpu->mmio_fragments overflow The bug can be exposed by isapc (-M isapc): [23154.818733] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [ ......] [23154.858083] Call Trace: [23154.859874] [<ffffffffa04f0e17>] kvm_get_cr8+0x1d/0x28 [kvm] [23154.861677] [<ffffffffa04fa6d4>] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm] [23154.863604] [<ffffffffa04f5a1a>] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm] Actually, we can use one mmio_fragment to store a large mmio access then split it when we pass the mmio-exit-info to userspace. After that, we only need two entries to store mmio info for the cross-mmio pages access Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: apic: fix LDR calculation in x2apic modeGleb Natapov2012-10-221-1/+1
| | | | | | | | | | | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: MMU: fix release noslot pfnXiao Guangrong2012-10-221-2/+1
|/ | | | | | | | | | We can not directly call kvm_release_pfn_clean to release the pfn since we can meet noslot pfn which is used to cache mmio info into spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2012-10-0421-1074/+1408
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c
| * KVM: x86: Fix guest debug across vcpu INIT resetJan Kiszka2012-09-233-41/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we reset a vcpu on INIT, we so far overwrote dr7 as provided by KVM_SET_GUEST_DEBUG, and we also cleared switch_db_regs unconditionally. Fix this by saving the dr7 used for guest debugging and calculating the effective register value as well as switch_db_regs on any potential change. This will change to focus of the set_guest_debug vendor op to update_dp_bp_intercept. Found while trying to stop on start_secondary. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Add resampling irqfds for level triggered interruptsAlex Williamson2012-09-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To emulate level triggered interrupts, add a resample option to KVM_IRQFD. When specified, a new resamplefd is provided that notifies the user when the irqchip has been resampled by the VM. This may, for instance, indicate an EOI. Also in this mode, posting of an interrupt through an irqfd only asserts the interrupt. On resampling, the interrupt is automatically de-asserted prior to user notification. This enables level triggered interrupts to be posted and re-enabled from vfio with no userspace intervention. All resampling irqfds can make use of a single irq source ID, so we reserve a new one for this interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: optimize apic interrupt deliveryGleb Natapov2012-09-203-12/+181
| | | | | | | | | | | | | | | | | | | | | | Most interrupt are delivered to only one vcpu. Use pre-build tables to find interrupt destination instead of looping through all vcpus. In case of logical mode loop only through vcpus in a logical cluster irq is sent to. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: Eliminate pointless temporary 'ac'Avi Kivity2012-09-201-4/+1
| | | | | | | | | | | | | | | | | | 'ac' essentially reconstructs the 'access' variable we already have, except for the PFERR_PRESENT_MASK and PFERR_RSVD_MASK. As these are not used by callees, just use 'access' directly. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: Avoid access/dirty update loop if all is wellAvi Kivity2012-09-201-6/+20
| | | | | | | | | | | | | | | | Keep track of accessed/dirty bits; if they are all set, do not enter the accessed/dirty update loop. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
OpenPOWER on IntegriCloud