op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'pm+acpi-3.19-rc1-2' of ↵	Linus Torvalds	2014-12-18	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI and power management updates from Rafael Wysocki: "These are regression fixes (leds-gpio, ACPI backlight driver, operating performance points library, ACPI device enumeration messages, cpupower tool), other bug fixes (ACPI EC driver, ACPI device PM), some cleanups in the operating performance points (OPP) framework, continuation of CONFIG_PM_RUNTIME elimination, a couple of minor intel_pstate driver changes, a new MAINTAINERS entry for it and an ACPI fan driver change needed for better support of thermal management in user space. Specifics: - Fix a regression in leds-gpio introduced by a recent commit that inadvertently changed the name of one of the properties used by the driver (Fabio Estevam). - Fix a regression in the ACPI backlight driver introduced by a recent fix that missed one special case that had to be taken into account (Aaron Lu). - Drop the level of some new kernel messages from the ACPI core introduced by a recent commit to KERN_DEBUG which they should have used from the start and drop some other unuseful KERN_ERR messages printed by ACPI (Rafael J Wysocki). - Revert an incorrect commit modifying the cpupower tool (Prarit Bhargava). - Fix two regressions introduced by recent commits in the OPP library and clean up some existing minor issues in that code (Viresh Kumar). - Continue to replace CONFIG_PM_RUNTIME with CONFIG_PM throughout the tree (or drop it where that can be done) in order to make it possible to eliminate CONFIG_PM_RUNTIME (Rafael J Wysocki, Ulf Hansson, Ludovic Desroches). There will be one more "CONFIG_PM_RUNTIME removal" batch after this one, because some new uses of it have been introduced during the current merge window, but that should be sufficient to finally get rid of it. - Make the ACPI EC driver more robust against race conditions related to GPE handler installation failures (Lv Zheng). - Prevent the ACPI device PM core code from attempting to disable GPEs that it has not enabled which confuses ACPICA and makes it report errors unnecessarily (Rafael J Wysocki). - Add a "force" command line switch to the intel_pstate driver to make it possible to override the blacklisting of some systems in that driver if needed (Ethan Zhao). - Improve intel_pstate code documentation and add a MAINTAINERS entry for it (Kristen Carlson Accardi). - Make the ACPI fan driver create cooling device interfaces witn names that reflect the IDs of the ACPI device objects they are associated with, except for "generic" ACPI fans (PNP ID "PNP0C0B"). That's necessary for user space thermal management tools to be able to connect the fans with the parts of the system they are supposed to be cooling properly. From Srinivas Pandruvada" * tag 'pm+acpi-3.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (32 commits) MAINTAINERS: add entry for intel_pstate ACPI / video: update the skip case for acpi_video_device_in_dod() power / PM: Eliminate CONFIG_PM_RUNTIME NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / EC: Fix unexpected ec_remove_handlers() invocations Revert "tools: cpupower: fix return checks for sysfs_get_idlestate_count()" tracing / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c PM: Remove the SET_PM_RUNTIME_PM_OPS() macro mmc: atmel-mci: use SET_RUNTIME_PM_OPS() macro PM / Kconfig: Replace PM_RUNTIME with PM in dependencies ARM / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM sound / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM phy / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM video / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM tty / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM spi: Replace CONFIG_PM_RUNTIME with CONFIG_PM ACPI / PM: Do not disable wakeup GPEs that have not been enabled ACPI / utils: Drop error messages from acpi_evaluate_reference() ...
\| *	x86 / PM: Replace CONFIG_PM_RUNTIME in io_apic.c	Rafael J. Wysocki	2014-12-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks depending on CONFIG_PM_RUNTIME may now be changed to depend on CONFIG_PM. Replace CONFIG_PM_RUNTIME with CONFIG_PM in x86/kernel/apic/io_apic.c. Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \|	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2014-12-18	24	-641/+3600
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
\| * \|	KVM: move APIC types to arch/x86/	Paolo Bonzini	2014-12-18	2	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They are not used anymore by IA64, move them away. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \|	Merge tag 'kvm-arm-for-3.19-take2' of ↵	Paolo Bonzini	2014-12-15	1	-3/+3
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot problems, clarifies VCPU init, and fixes a regression concerning the VGIC init flow. Conflicts: arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
\| \| * \|	kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()	Ard Biesheuvel	2014-11-25	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
\| \| * \|	KVM: nVMX: Disable preemption while reading from shadow VMCS	Jan Kiszka	2014-10-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to access the shadow VMCS, we need to load it. At this point, vmx->loaded_vmcs->vmcs and the actually loaded one start to differ. If we now get preempted by Linux, vmx_vcpu_put and, on return, the vmx_vcpu_load will work against the wrong vmcs. That can cause copy_shadow_to_vmcs12 to corrupt the vmcs12 state. Fix the issue by disabling preemption during the copy operation. copy_vmcs12_to_shadow is safe from this issue as it is executed by vmx_vcpu_run when preemption is already disabled before vmentry. This bug is exposed by running Jailhouse within KVM on CPUs with shadow VMCS support. Jailhouse never expects an interrupt pending vmexit, but the bug can cause it if, after copy_shadow_to_vmcs12 is preempted, the active VMCS happens to have the virtual interrupt pending flag set in the CPU-based execution controls. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| \| * \|	KVM: x86: Fix far-jump to non-canonical check	Nadav Amit	2014-10-29	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps") introduced a bug that caused the fix to be incomplete. Due to incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may not trigger #GP. As we know, this imposes a security problem. In addition, the condition for two warnings was incorrect. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: em_ret_far overrides cpl	Nadav Amit	2014-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit d50eaa18039b ("KVM: x86: Perform limit checks when assigning EIP") mistakenly used zero as cpl on em_ret_far. Use the actual one. Fixes: d50eaa18039b8b848c2285478d0775335ad5e930 Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: nVMX: Disable unrestricted mode if ept=0	Bandan Das	2014-12-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If L0 has disabled EPT, don't advertise unrestricted mode at all since it depends on EPT to run real mode code. Fixes: 92fbc7b195b824e201d9f06f2b93105f72384d65 Cc: stable@vger.kernel.org Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Emulate should check #UD before #GP	Nadav Amit	2014-12-10	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intel SDM table 6-2 ("Priority Among Simultaneous Exceptions and Interrupts") shows that faults from decoding the next instruction got higher priority than general protection. Moving the protected-mode check before the CPL check to avoid wrong exception on vm86 mode. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Do not push eflags.vm on pushf	Nadav Amit	2014-12-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pushf instruction does not push eflags.VM, so emulation should not do so as well. Although eflags.RF should not be pushed as well, it is already cleared by the time pushf is executed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Remove prefix flag when GP macro is used	Nadav Amit	2014-12-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The macro GP already sets the flag Prefix. Remove the redundant flag for 0f_38_f0 and 0f_38_f1 opcodes. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit	Andy Lutomirski	2014-12-10	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should not be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Cc: stable@vger.kernel.org Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: cpuid: recompute CPUID 0xD.0:EBX,ECX	Radim Krčmář	2014-12-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We reused host EBX and ECX, but KVM might not support all features; emulated XSAVE size should be smaller. EBX depends on unknown XCR0, so we default to ECX. SDM CPUID (EAX = 0DH, ECX = 0): EBX Bits 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) required by enabled features in XCR0. May be different than ECX if some features at the end of the XSAVE save area are not enabled. ECX Bit 31-00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required by all supported features in the processor, i.e all the valid bit fields in XCR0. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: vmx: add nested virtualization support for xsaves	Wanpeng Li	2014-12-05	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add nested virtualization support for xsaves. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: vmx: add MSR logic for XSAVES	Wanpeng Li	2014-12-05	2	-1/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add logic to get/set the XSS model-specific register. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: handle XSAVES vmcs and vmexit	Wanpeng Li	2014-12-05	3	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initialize the XSS exit bitmap. It is zero so there should be no XSAVES or XRSTORS exits. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: cpuid: mask more bits in leaf 0xd and subleaves	Paolo Bonzini	2014-12-05	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- EAX=0Dh, ECX=1: output registers EBX/ECX/EDX are reserved. - EAX=0Dh, ECX>1: output register ECX bit 0 is clear for all the CPUID leaves we support, because variable "supported" comes from XCR0 and not XSS. Bits above 0 are reserved, so ECX is overall zero. Output register EDX is reserved. Source: Intel Architecture Instruction Set Extensions Programming Reference, ref. number 319433-022 Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: cpuid: set CPUID(EAX=0xd,ECX=1).EBX correctly	Paolo Bonzini	2014-12-05	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the size of the XSAVES area. This starts providing guest support for XSAVES (with no support yet for supervisor states, i.e. XSS == 0 always in guests for now). Wanpeng Li suggested testing XSAVEC as well as XSAVES, since in practice no real processor exists that only has one of them, and there is no other way for userspace programs to compute the area of the XSAVEC save area. CPUID(EAX=0xd,ECX=1).EBX provides an upper bound. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest	Wanpeng Li	2014-12-05	5	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expose the XSAVES feature to the guest if the kvm_x86_ops say it is available. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: use F() macro throughout cpuid.c	Paolo Bonzini	2014-12-05	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For code that deals with cpuid, this makes things a bit more readable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: support XSAVES usage in the host	Paolo Bonzini	2014-12-05	1	-7/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Fixes: f31a9f7c71691569359fa7fb8b0acaa44bce0324 Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Cc: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	x86: export get_xsave_addr	Paolo Bonzini	2014-12-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_xsave_addr is the API to access XSAVE states, and KVM would like to use it. Export it. Cc: stable@vger.kernel.org Cc: x86@kernel.org Cc: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: allow 256 logical x2APICs again	Radim Krčmář	2014-12-04	2	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While fixing an x2apic bug, 17d68b7 KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) we've made only one cluster available. This means that the amount of logically addressible x2APICs was reduced to 16 and VCPUs kept overwriting themselves in that region, so even the first cluster wasn't set up correctly. This patch extends x2APIC support back to the logical_map's limit, and keeps the CVE fixed as messages for non-present APICs are dropped. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: check bounds of APIC maps	Radim Krčmář	2014-12-04	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They can't be violated now, but play it safe for the future. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: fix APIC physical destination wrapping	Radim Krčmář	2014-12-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x2apic allows destinations > 0xff and we don't want them delivered to lower APICs. They are correctly handled by doing nothing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: deliver phys lowest-prio	Radim Krčmář	2014-12-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Physical mode can't address more than one APIC, but lowest-prio is allowed, so we just reuse our paths. SDM 10.6.2.1 Physical Destination: Also, for any non-broadcast IPI or I/O subsystem initiated interrupt with lowest priority delivery mode, software must ensure that APICs defined in the interrupt address are present and enabled to receive interrupts. We could warn on top of that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: don't retry hopeless APIC delivery	Radim Krčmář	2014-12-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	False from kvm_irq_delivery_to_apic_fast() means that we don't handle it in the fast path, but we still return false in cases that were perfectly handled, fix that. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: use MSR_ICR instead of a number	Radim Krčmář	2014-12-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	0x830 MSR is 0x300 xAPIC MMIO, which is MSR_ICR. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Fix reserved x2apic registers	Nadav Amit	2014-12-04	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x2APIC has no registers for DFR and ICR2 (see Intel SDM 10.12.1.2 "x2APIC Register Address Space"). KVM needs to cause #GP on such accesses. Fix it (DFR and ICR2 on read, ICR2 on write, DFR already handled on writes). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Generate #UD when memory operand is required	Nadav Amit	2014-12-04	1	-4/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain x86 instructions that use modrm operands only allow memory operand (i.e., mod012), and cause a #UD exception otherwise. KVM ignores this fact. Currently, the instructions that are such and are emulated by KVM are MOVBE, MOVNTPS, MOVNTPD and MOVNTI. MOVBE is the most blunt example, since it may be emulated by the host regardless of MMIO. The fix introduces a new group for handling such instructions, marking mod3 as illegal instruction. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: avoid warning about potential shift wrapping bug	Paolo Bonzini	2014-11-24	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: move device assignment out of kvm_host.h	Paolo Bonzini	2014-11-24	5	-33/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <rkrcmark@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: mask out XSAVES	Paolo Bonzini	2014-11-23	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This feature is not supported inside KVM guests yet, because we do not emulate MSR_IA32_XSS. Mask it out. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: move assigned-dev.c and iommu.c to arch/x86/	Radim Krčmář	2014-11-23	5	-2/+1409
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64	Radim Krcmar	2014-11-21	2	-24/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/	Paolo Bonzini	2014-11-21	6	-3/+1150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Remove FIXMEs in emulate.c	Nicholas Krause	2014-11-19	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove FIXME comments about needing fault addresses to be returned. These are propaagated from walk_addr_generic to gva_to_gpa and from there to ops->read_std and ops->write_std. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: emulator: remove duplicated limit check	Paolo Bonzini	2014-11-19	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The check on the higher limit of the segment, and the check on the maximum accessible size, is the same for both expand-up and expand-down segments. Only the computation of "lim" varies. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: emulator: remove code duplication in register_address{,_increment}	Paolo Bonzini	2014-11-19	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	register_address has been a duplicate of address_mask ever since the ancestor of __linearize was born in 90de84f50b42 (KVM: x86 emulator: preserve an operand's segment identity, 2010-11-17). However, we can put it to a better use by including the call to reg_read in register_address. Similarly, the call to reg_rmw can be moved to register_address_increment. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Move __linearize masking of la into switch	Nadav Amit	2014-11-19	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In __linearize there is check of the condition whether to check if masking of the linear address is needed. It occurs immediately after switch that evaluates the same condition. Merge them. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Non-canonical access using SS should cause #SS	Nadav Amit	2014-11-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When SS is used using a non-canonical address, an #SS exception is generated on real hardware. KVM emulator causes a #GP instead. Fix it to behave as real x86 CPU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Perform limit checks when assigning EIP	Nadav Amit	2014-11-19	1	-41/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If branch (e.g., jmp, ret) causes limit violations, since the target IP > limit, the #GP exception occurs before the branch. In other words, the RIP pushed on the stack should be that of the branch and not that of the target. To do so, we can call __linearize, with new EIP, which also saves us the code which performs the canonical address checks. On the case of assigning an EIP >= 2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP does not exceed the limit and would trigger #GP(0) otherwise. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Emulator performs privilege checks on __linearize	Nadav Amit	2014-11-19	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When segment is accessed, real hardware does not perform any privilege level checks. In contrast, KVM emulator does. This causes some discrepencies from real hardware. For instance, reading from readable code segment may fail due to incorrect segment checks. In addition, it introduces unnecassary overhead. To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register." The SDM never mentions privilege level checks during memory access, except for loading far pointers in section 5.10 ("Pointer Validation"). Those are actually segment selector loads and are emulated in the similarily (i.e., regardless to __linearize checks). This behavior was also checked using sysexit. A data-segment whose DPL=0 was loaded, and after sysexit (CPL=3) it is still accessible. Therefore, all the privilege level checks in __linearize are removed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Stack size is overridden by __linearize	Nadav Amit	2014-11-19	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When performing segmented-read/write in the emulator for stack operations, it ignores the stack size, and uses the ad_bytes as indication for the pointer size. As a result, a wrong address may be accessed. To fix this behavior, we can remove the masking of address in __linearize and perform it beforehand. It is already done for the operands (so currently it is inefficiently done twice). It is missing in two cases: 1. When using rip_relative 2. On fetch_bit_operand that changes the address. This patch masks the address on these two occassions, and removes the masking from __linearize. Note that it does not mask EIP during fetch. In protected/legacy mode code fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make limit checks within __linearize, this is the expected behavior. Partial revert of commit 518547b32ab4 (KVM: x86: Emulator does not calculate address correctly, 2014-09-30). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Revert NoBigReal patch in the emulator	Nadav Amit	2014-11-19	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode") introduced NoBigReal for instructions such as MONITOR. Apparetnly, the Intel SDM description that led to this patch is misleading. Since no instruction is using NoBigReal, it is safe to remove it, we fully understand what the SDM means. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: vmx: remove MMIO_MAX_GEN	Tiejun Chen	2014-11-18	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MMIO_MAX_GEN is the same as MMIO_GEN_MASK. Use only one. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	kvm: x86: vmx: cleanup handle_ept_violation	Tiejun Chen	2014-11-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK inside handle_ept_violation() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| * \| \|	KVM: x86: Fix lost interrupt on irr_pending race	Nadav Amit	2014-11-17	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is set. If this assumption is broken and apicv is disabled, the injection of interrupts may be deferred until another interrupt is delivered to the guest. Ultimately, if no other interrupt should be injected to that vCPU, the pending interrupt may be lost. commit 56cc2406d68c ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use") changed the behavior of apic_clear_irr so irr_pending is cleared after setting APIC_IRR vector. After this commit, if apic_set_irr and apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR vector set, and irr_pending cleared. In the following example, assume a single vector is set in IRR prior to calling apic_clear_irr: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic_clear_vector(...); vec = apic_search_irr(apic); // => vec == -1 apic_set_vector(...); apic->irr_pending = (vec != -1); // => apic->irr_pending == false Nonetheless, it appears the race might even occur prior to this commit: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic->irr_pending = false; apic_clear_vector(...); if (apic_search_irr(apic) != -1) apic->irr_pending = true; // => apic->irr_pending == false apic_set_vector(...); Fixing this issue by: 1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending. 2. On apic_set_irr: first call apic_set_vector, then set irr_pending. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>