summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-04-1399-2057/+5401
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
| * KVM: use slowpath for cross page cached accessesRadim Krčmář2015-04-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * kvm: mmu: lazy collapse small sptes into large sptesWanpeng Li2015-04-083-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dirty logging tracks sptes in 4k granularity, meaning that large sptes have to be split. If live migration is successful, the guest in the source machine will be destroyed and large sptes will be created in the destination. However, the guest continues to run in the source machine (for example if live migration fails), small sptes will remain around and cause bad performance. This patch introduce lazy collapsing of small sptes into large sptes. The rmap will be scanned in ioctl context when dirty logging is stopped, dropping those sptes which can be collapsed into a single large-page spte. Later page faults will create the large-page sptes. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: Clear CR2 on VCPU resetNadav Amit2015-04-081-0/+2
| | | | | | | | | | | | | | | | | | CR2 is not cleared as it should after reset. See Intel SDM table named "IA-32 Processor States Following Power-up, Reset, or INIT". Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-5-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: DR0-DR3 are not clear on resetNadav Amit2015-04-082-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | DR0-DR3 are not cleared as they should during reset and when they are set from userspace. It appears to be caused by c77fb5fe6f03 ("KVM: x86: Allow the guest to run with dirty debug registers"). Force their reload on these situations. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: BSP in MSR_IA32_APICBASE is writableNadav Amit2015-04-085-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: simplify kvm_apic_mapRadim Krčmář2015-04-083-68/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | recalculate_apic_map() uses two passes over all VCPUs. This is a relic from time when we selected a global mode in the first pass and set up the optimized table in the second pass (to have a consistent mode). Recent changes made mixed mode unoptimized and we can do it in one pass. Format of logical MDA is a function of the mode, so we encode it in apic_logical_id() and drop obsoleted variables from the struct. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-5-git-send-email-rkrcmar@redhat.com> [Add lid_bits temporary in apic_logical_id. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: avoid logical_map when it is invalidRadim Krčmář2015-04-082-1/+35
| | | | | | | | | | | | | | | | | | | | We want to support mixed modes and the easiest solution is to avoid optimizing those weird and unlikely scenarios. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-4-git-send-email-rkrcmar@redhat.com> [Add comment above KVM_APIC_MODE_* defines. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: fix mixed APIC mode broadcastRadim Krčmář2015-04-082-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadcast allowed only one global APIC mode, but mixed modes are theoretically possible. x2APIC IPI doesn't mean 0xff as broadcast, the rest does. x2APIC broadcasts are accepted by xAPIC. If we take SDM to be logical, even addreses beginning with 0xff should be accepted, but real hardware disagrees. This patch aims for simple code by considering most of real behavior as undefined. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-3-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: use MDA for interrupt matchingRadim Krčmář2015-04-081-7/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mixed modes, we musn't deliver xAPIC IPIs like x2APIC and vice versa. Instead of preserving the information in apic_send_ipi(), we regain it by converting all destinations into correct MDA in the slow path. This allows easier reasoning about subsequent matching. Our kvm_apic_broadcast() had an interesting design decision: it didn't consider IOxAPIC 0xff as broadcast in x2APIC mode ... everything worked because IOxAPIC can't set that in physical mode and logical mode considered it as a message for first 8 VCPUs. This patch interprets IOxAPIC 0xff as x2APIC broadcast. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1423766494-26150-2-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * kvm/ppc/mpic: drop unused IRQ_testbitArseny Solokha2015-04-081-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop unused static procedure which doesn't have callers within its translation unit. It had been already removed independently in QEMU[1] from the OpenPIC implementation borrowed from the kernel. [1] https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg01812.html Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Cc: Alexander Graf <agraf@suse.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424768706-23150-3-git-send-email-asolokha@kb.kras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: nVMX: remove unnecessary double caching of MAXPHYADDREugene Korenevsky2015-04-081-8/+6
| | | | | | | | | | | | | | | | | | | | | | After speed-up of cpuid_maxphyaddr() it can be called frequently: instead of heavyweight enumeration of CPUID entries it returns a cached pre-computed value. It is also inlined now. So caching its result became unnecessary and can be removed. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205644.GA1258@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entryEugene Korenevsky2015-04-081-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | On each VM-entry CPU should check the following VMCS fields for zero bits beyond physical address width: - APIC-access address - virtual-APIC address - posted-interrupt descriptor address This patch adds these checks required by Intel SDM. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205627.GA1244@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpuEugene Korenevsky2015-04-084-16/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: vmx: pass error code with internal error #2Radim Krčmář2015-04-081-1/+2
| | | | | | | | | | | | | | | | | | Exposing the on-stack error code with internal error is cheap and potentially useful. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428001865-32280-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * x86: vdso: fix pvclock races with task migrationRadim Krčmář2015-04-081-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we were migrated right after __getcpu, but before reading the migration_count, we wouldn't notice that we read TSC of a different VCPU, nor that KVM's bug made pvti invalid, as only migration_count on source VCPU is increased. Change vdso instead of updating migration_count on destination. Cc: stable@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"") Message-Id: <1428000263-11892-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: remove kvm_read_hva and kvm_read_hva_atomicPaolo Bonzini2015-04-081-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cffb498 (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
| * KVM: x86: optimize delivery of TSC deadline timer interruptPaolo Bonzini2015-04-081-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly-added tracepoint shows the following results on the tscdeadline_latency test: qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns and so on. This is because we need to go through kvm_vcpu_block again after the timer IRQ is injected. Avoid it by polling once before entering kvm_vcpu_block. On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%) from the latency of the TSC deadline timer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86: extract blocking logic from __vcpu_runPaolo Bonzini2015-04-081-28/+34
| | | | | | | | | | | | | | | | | | | | | | Rename the old __vcpu_run to vcpu_run, and extract part of it to a new function vcpu_block. The next patch will add a new condition in vcpu_block, avoid extra indentation. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * kvm: x86: fix x86 eflags fixed bitWanpeng Li2015-04-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guest can't be booted w/ ept=0, there is a message dumped as below: If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=00000011 EBX=f000d2f6 ECX=00006cac EDX=000f8956 ESI=bffbdf62 EDI=00000000 EBP=00006c68 ESP=00006c68 EIP=0000d187 EFL=00000004 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =e000 000e0000 ffffffff 00809300 DPL=0 DS16 [-WA] CS =f000 000f0000 ffffffff 00809b00 DPL=0 CS16 [-RA] SS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] DS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] FS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] GS =0000 00000000 ffffffff 00809300 DPL=0 DS16 [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f6a80 00000037 IDT= 000f6abe 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=01 1e b8 6a 2e 0f 01 16 74 6a 0f 20 c0 66 83 c8 01 0f 22 c0 <66> ea 8f d1 0f 00 08 00 b8 10 00 00 00 8e d8 8e c0 8e d0 8e e0 8e e8 89 c8 ff e2 89 c1 b8X X86 eflags bit 1 is fixed set, which means that 1 << 1 is set instead of 1, this patch fix it. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428473294-6633-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Merge tag 'kvm-s390-next-20150331' of ↵Paolo Bonzini2015-04-079-402/+906
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Features and fixes for 4.1 (kvm/next) 1. Assorted changes 1.1 allow more feature bits for the guest 1.2 Store breaking event address on program interrupts 2. Interrupt handling rework 2.1 Fix copy_to_user while holding a spinlock (cc stable) 2.2 Rework floating interrupts to follow the priorities 2.3 Allow to inject all local interrupts via new ioctl 2.4 allow to get/set the full local irq state, e.g. for migration and introspection
| | * KVM: s390: migrate vcpu interrupt stateJens Freimann2015-03-315-0/+252
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls are added which get/set the complete status of pending interrupts in one go. The ioctls are marked as available with the new capability KVM_CAP_S390_IRQ_STATE. We can not use a ONEREG, as the number of pending local interrupts is not constant and depends on the number of CPUs. To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE. Its input parameter is a pointer to a struct kvm_s390_irq_state which has a buffer and length. For all currently pending interrupts, we copy a struct kvm_s390_irq into the buffer and pass it to userspace. To store interrupt state into a buffer provided by userspace, we add an ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into the kernel and injects all interrupts contained in the buffer. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: refactor vcpu injection functionJens Freimann2015-03-311-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's provide a version of kvm_s390_inject_vcpu() that does not acquire the local-interrupt lock and skips waking up the vcpu. To be used in a later patch for vcpu-local interrupt migration, where we are already holding the lock. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: add ioctl to inject local interruptsJens Freimann2015-03-314-1/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: cpu timer irq priorityDavid Hildenbrand2015-03-311-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now have a mechanism for delivering interrupts according to their priority. Let's inject them using our new infrastructure (instead of letting only hardware handle them), so we can be sure that the irq priorities are satisfied. For s390, the cpu timer and the clock comparator are to be checked for common code kvm_cpu_has_pending_timer(), although the cpu timer is only stepped when the guest is being executed. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: deliver floating interrupts in order of priorityJens Freimann2015-03-315-367/+510
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes interrupt handling compliant to the z/Architecture Principles of Operation with regard to interrupt priorities. Add a bitmap for pending floating interrupts. Each bit relates to a interrupt type and its list. A turned on bit indicates that a list contains items (interrupts) which need to be delivered. When delivering interrupts on a cpu we can merge the existing bitmap for cpu-local interrupts and floating interrupts and have a single mechanism for delivery. Currently we have one list for all kinds of floating interrupts and a corresponding spin lock. This patch adds a separate list per interrupt type. An exception to this are service signal and machine check interrupts, as there can be only one pending interrupt at a time. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: fix get_all_floating_irqsJens Freimann2015-03-312-26/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug introduced with commit c05c4186bbe4 ("KVM: s390: add floating irq controller"). get_all_floating_irqs() does copy_to_user() while holding a spin lock. Let's fix this by filling a temporary buffer first and copy it to userspace after giving up the lock. Cc: <stable@vger.kernel.org> # 3.18+: 69a8d4562638 KVM: s390: no need to hold... Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: enable more features that need no hypervisor changesChristian Borntraeger2015-03-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After some review about what these facilities do, the following facilities will work under KVM and can, therefore, be reported to the guest if the cpu model and the host cpu provide this bit. There are plans underway to make the whole bit thing more readable, but its not yet finished. So here are some last bit changes and we enhance the KVM mask with: 9 The sense-running-status facility is installed in the z/Architecture architectural mode. ---> handled by SIE or KVM 10 The conditional-SSKE facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 13 The IPTE-range facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 36 The enhanced-monitor facility is installed in the z/Architecture architectural mode. ---> handled by SIE 47 The CMPSC-enhancement facility is installed in the z/Architecture architectural mode. ---> handled by SIE 48 The decimal-floating-point zoned-conversion facility is installed in the z/Architecture architectural mode. ---> handled by SIE 49 The execution-hint, load-and-trap, miscellaneous- instruction-extensions and processor-assist ---> handled by SIE 51 The local-TLB-clearing facility is installed in the z/Architecture architectural mode. ---> handled by SIE 52 The interlocked-access facility 2 is installed. ---> handled by SIE 53 The load/store-on-condition facility 2 and load-and- zero-rightmost-byte facility are installed in the z/Architecture architectural mode. ---> handled by SIE 57 The message-security-assist-extension-5 facility is installed in the z/Architecture architectural mode. ---> handled by SIE 66 The reset-reference-bits-multiple facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 80 The decimal-floating-point packed-conversion facility is installed in the z/Architecture architectural mode. ---> handled by SIE Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Michael Mueller <mimu@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * KVM: s390: store the breaking-event address on pgm interruptsDavid Hildenbrand2015-03-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the PER-3 facility is installed, the breaking-event address is to be stored in the low core. There is no facility bit for PER-3 in stfl(e) and Linux always uses the value at address 272 no matter if PER-3 is available or not. We can't hide its existence from the guest. All program interrupts injected via the SIE automatically store this information if the PER-3 facility is available in the hypervisor. Also the itdb contains the address automatically. As there is no switch to turn this mechanism off, let's simply make it consistent and also store the breaking event address in case of manual program interrupt injection. Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * | Merge tag 'kvm-arm-for-4.1' of ↵Paolo Bonzini2015-04-0747-703/+1012
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
| | * | KVM: arm/arm64: enable KVM_CAP_IOEVENTFDNikolay Nikolaev2015-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the infrastructure for eventfd has now been merged, report the ioeventfd capability as being supported. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> [maz: grouped the case entry with the others, fixed commit log] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO busAndre Przywara2015-03-308-221/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handlingAndre Przywara2015-03-302-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: merge GICv3 RD_base and SGI_base register framesAndre Przywara2015-03-301-91/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_busAndre Przywara2015-03-262-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGICAndre Przywara2015-03-263-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: simplify vgic_find_range() and callersAndre Przywara2015-03-263-17/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_rangeAndre Przywara2015-03-264-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: x86: remove now unneeded include directory from MakefileAndre Przywara2015-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the x86 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: arm/arm64: remove now unneeded include directory from MakefileAndre Przywara2015-03-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the arm and arm64 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: move iodev.h from virt/kvm/ to include/kvmAndre Przywara2015-03-269-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.Nikolay Nikolaev2015-03-2614-64/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * | arm/arm64: KVM: Fix migration race in the arch timerChristoffer Dall2015-03-143-10/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: support for un-queuing active IRQsChristoffer Dall2015-03-144-38/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrating active interrupts causes the active state to be lost completely. This implements some additional bitmaps to track the active state on the distributor and export this to user space. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: add a common vgic_queue_irq_to_lr fnAlex Bennée2015-03-141-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helps re-factor away some of the repetitive code and makes the code flow more nicely. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: export VCPU power state via MP_STATE ioctlAlex Bennée2015-03-142-6/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To cleanly restore an SMP VM we need to ensure that the current pause state of each vcpu is correctly recorded. Things could get confused if the CPU starts running after migration restore completes when it was paused before it state was captured. We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64 interface is a lot simpler as the only valid states are KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: Optimize handling of Access Flag faultsMarc Zyngier2015-03-122-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have page aging in Stage-2, it becomes obvious that we're doing way too much work handling the fault. The page is not going anywhere (it is still mapped), the page tables are already allocated, and all we want is to flip a bit in the PMD or PTE. Also, we can avoid any form of TLB invalidation, since a page with the AF bit off is not allowed to be cached. An obvious solution is to have a separate handler for FSC_ACCESS, where we pride ourselves to only do the very minimum amount of work. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier2015-03-127-23/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | arm/arm64: KVM: Allow handle_hva_to_gpa to return a valueMarc Zyngier2015-03-121-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, handle_hva_to_gpa was never required to return a value. As we prepare to age pages at Stage-2, we need to be able to return a value from the iterator (kvm_test_age_hva). Adapt the code to handle this situation. No semantic change. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: add irqfd supportEric Auger2015-03-129-3/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
OpenPOWER on IntegriCloud