summaryrefslogtreecommitdiffstats
path: root/drivers/kvm/kvm_main.c
Commit message (Collapse)AuthorAgeFilesLines
* sched: guest CPU accounting: maintain guest state in KVMLaurent Vivier2007-10-151-0/+2
| | | | | | | | | | Modify KVM to update guest time accounting. [ mingo@elte.hu: ported to 2.6.24 KVM. ] Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Acked-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* KVM: Skip pio instruction when it is emulated, not executedAvi Kivity2007-10-131-2/+5
| | | | | | | | | | | If we defer updating rip until pio instructions are executed, we have a problem with reset: a pio reset updates rip, and when the instruction completes we skip the emulated instruction, pointing rip somewhere completely unrelated. Fix by updating rip when we see decode the instruction, not after emulation. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Improve emulation failure reportingAvi Kivity2007-10-131-8/+8
| | | | | | Report failed opcodes from all locations. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move main vcpu loop into subarch independent codeAvi Kivity2007-10-131-1/+123
| | | | | | This simplifies adding new code as well as reducing overall code size. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Rename kvm_arch_ops to kvm_x86_opsChristian Ehrhardt2007-10-131-80/+80
| | | | | | | | This patch just renames the current (misnamed) _arch namings to _x86 to ensure better readability when a real arch layer takes place. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Simplify memory allocationLaurent Vivier2007-10-131-35/+3
| | | | | | | | | | The mutex->splinlock convertion alllows us to make some code simplifications. As we can keep the lock longer, we don't have to release it and then have to check if the environment has not been modified before re-taking it. We can remove kvm->busy and kvm->memory_config_version. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Hoist SVM's get_cs_db_l_bits into core code.Rusty Russell2007-10-131-0/+10
| | | | | | | | SVM gets the DB and L bits for the cs by decoding the segment. This is in fact the completely generic code, so hoist it for kvm-lite to use. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Keep control regs in syncRusty Russell2007-10-131-4/+4
| | | | | | | | We don't update the vcpu control registers in various places. We should do so. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Set the ET flag in CR0 after initializing FXAmit Shah2007-10-131-0/+1
| | | | | | | | | This was missed when moving stuff around in fbc4f2e Fixes Solaris guests and bug #1773613 Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: enable in-kernel APIC INIT/SIPI handlingHe, Qing2007-10-131-6/+20
| | | | | | | | | | | This patch enables INIT/SIPI handling using in-kernel APIC by introducing a ->mp_state field to emulate the SMP state transition. [avi: remove smp_processor_id() warning] Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Xin Li <xin.b.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: disable tpr/cr8 sync when in-kernel APIC is usedHe, Qing2007-10-131-1/+2
| | | | | Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Keep track of missed timer irq injectionsEddie Dong2007-10-131-0/+2
| | | | | | | | | | | | | APIC timer IRQ is set every time when a certain period expires at host time, but the guest may be descheduled at that time and thus the irq be overwritten by later fire. This patch keep track of firing irq numbers and decrease only when the IRQ is injected to guest or buffered in APIC. Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: pending irq save/restoreEddie Dong2007-10-131-3/+17
| | | | | | | | | | | Add in kernel irqchip save/restore support for pending vectors. [avi: fix compile warning on i386] [avi: remove printk] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: in-kernel LAPIC save and restore supportEddie Dong2007-10-131-0/+46
| | | | | | | | | | | | This patch adds a new vcpu-based IOCTL to save and restore the local apic registers for a single vcpu. The kernel only copies the apic page as a whole, extraction of registers is left to userspace side. On restore, the APIC timer is restarted from the initial count, this introduces a little delay, but works fine. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: in-kernel IOAPIC save and restore supportHe, Qing2007-10-131-0/+10
| | | | | | | | | | This patch adds support for in-kernel ioapic save and restore (to and from userspace). It uses the same get/set_irqchip ioctl as in-kernel PIC. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Bypass irq_pending get/set when using in kernel irqchipHe, Qing2007-10-131-8/+14
| | | | | | | | vcpu->irq_pending is saved in get/set_sreg IOCTL, but when in-kernel local APIC is used, doing this may occasionally overwrite vcpu->apic to an invalid value, as in the vm restore path. Signed-off-by: Qing He <qing.he@intel.com>
* KVM: Add get/set irqchip ioctls for in-kernel PIC live migration supportHe, Qing2007-10-131-0/+82
| | | | | | | | | | This patch adds two new ioctls to dump and write kernel irqchips for save/restore and live migration. PIC s/r and l/m is implemented in this patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Protect in-kernel pio using kvm->lockEddie Dong2007-10-131-0/+6
| | | | | | | | pio operation and IRQ_LINE kvm_vm_ioctl is not kvm->lock protected. Add lock to same with IOAPIC MMIO operations. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Emulate hlt in the kernelEddie Dong2007-10-131-6/+35
| | | | | | | | | By sleeping in the kernel when hlt is executed, we simplify the in-kernel guest interrupt path considerably. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: In-kernel I/O APIC modelEddie Dong2007-10-131-3/+12
| | | | | | | | | | | | | | | This allows in-kernel host-side device drivers to raise guest interrupts without going to userspace. [avi: fix level-triggered interrupt redelivery on eoi] [avi: add missing #include] [avi: avoid redelivery of edge-triggered interrupt] [avi: implement polarity] [avi: don't deliver edge-triggered interrupts when unmasking] [avi: fix host oops on invalid guest access] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Emulate local APIC in kernelEddie Dong2007-10-131-10/+42
| | | | | | | | | | | | | | | | Because lightweight exits (exits which don't involve userspace) are many times faster than heavyweight exits, it makes sense to emulate high usage devices in the kernel. The local APIC is one such device, especially for Windows and for SMP, so we add an APIC model to kvm. It also allows in-kernel host-side drivers to inject interrupts without going through userspace. [compile fix on i386 from Jindrich Makovicka] Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Define and use cr8 access functionsEddie Dong2007-10-131-7/+25
| | | | | | | | | This patch is to wrap APIC base register and CR8 operation which can provide a unique API for user level irqchip and kernel irqchip. This is a preparation of merging lapic/ioapic patch. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add support for in-kernel PIC emulationEddie Dong2007-10-131-6/+40
| | | | | Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Clean up kvm_setup_pio()Laurent Vivier2007-10-131-24/+39
| | | | | | | | Split kvm_setup_pio() into two functions, one to setup in/out pio (kvm_emulate_pio()) and one to setup ins/outs pio (kvm_emulate_pio_string()). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Cleanup string I/O instruction emulationLaurent Vivier2007-10-131-0/+3
| | | | | | | | | | | | | | | | | Both vmx and svm decode the I/O instructions, and both botch the job, requiring the instruction prefixes to be fetched in order to completely decode the instruction. So, if we see a string I/O instruction, use the x86 emulator to decode it, as it already has all the prefix decoding machinery. This patch defines ins/outs opcodes in x86_emulate.c and calls emulate_instruction() from io_interception() (svm.c) and from handle_io() (vmx.c). It removes all vmx/svm prefix instruction decoders (get_addr_size(), io_get_override(), io_address(), get_io_count()) Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove useless assignmentLaurent Vivier2007-10-131-2/+0
| | | | | | | | | | | | | | | | Line 1809 of kvm_main.c is useless, value is overwritten in line 1815: 1809 now = min(count, PAGE_SIZE / size); 1810 1811 if (!down) 1812 in_page = PAGE_SIZE - offset_in_page(address); 1813 else 1814 in_page = offset_in_page(address) + size; 1815 now = min(count, (unsigned long)in_page / size); 1816 if (!now) { Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add and use pr_unimpl for standard formatting of unimplemented featuresRusty Russell2007-10-131-10/+8
| | | | | | | | All guest-invokable printks should be ratelimited to prevent malicious guests from flooding logs. This is a start. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove unneeded kvm_dev_open and kvm_dev_release functions.Rusty Russell2007-10-131-12/+0
| | | | | | | Devices don't need open or release functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove stat_set from debugfsRusty Russell2007-10-131-5/+1
| | | | | | | | | We shouldn't define stat_set on the debug attributes, since that will cause silent failure on writing: without a set argument, userspace will get -EACCESS. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Cleanup mark_page_dirtyRusty Russell2007-10-131-17/+7
| | | | | | | For some reason, mark_page_dirty open-codes __gfn_to_memslot(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Don't assign vcpu->cr3 if it's invalid: check first, set lastRusty Russell2007-10-131-2/+3
| | | | | sSigned-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Add cpu consistency checkYang, Sheng2007-10-131-0/+10
| | | | | | | | | All the physical CPUs on the board should support the same VMX feature set. Add check_processor_compatibility to kvm_arch_ops for the consistency check. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: kvm_vm_ioctl_get_dirty_log restore "nothing dirty" optimizationRusty Russell2007-10-131-5/+8
| | | | | | | | | | | | | | | | | | | kvm_vm_ioctl_get_dirty_log scans bitmap to see it it's all zero, but doesn't use that information. Avi says: Looks like it was used to guard kvm_mmu_slot_remove_write_access(); optimizing the case where the guest just leaves the screen alone (which it usually does, especially in benchmarks). I'd rather reinstate that optimization. See 90cb0529dd230548a7f0d6b315997be854caea1b where the damage was done. It's pretty simple: if the bitmap is all zero, we don't need to do anything to clean it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use alignment properties of vcpu to simplify FPU opsRusty Russell2007-10-131-28/+17
| | | | | | | | | Now we use a kmem cache for allocating vcpus, we can get the 16-byte alignment required by fxsave & fxrstor instructions, and avoid manually aligning the buffer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use kmem cache for allocating vcpusRusty Russell2007-10-131-1/+15
| | | | | | | | | Avi wants the allocations of vcpus centralized again. The easiest way is to add a "size" arg to kvm_init_arch, and expose the thus-prepared cache to the modules. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove kvm_{read,write}_guest()Laurent Vivier2007-10-131-70/+4
| | | | | | | ... in favor of the more general emulator_{read,write}_*. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Change the emulator_{read,write,cmpxchg}_* functions to take a vcpuLaurent Vivier2007-10-131-14/+11
| | | | | | | ... instead of a x86_emulate_ctxt, so that other callers can use it easily. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove three magic numbersRusty Russell2007-10-131-1/+1
| | | | | | | | There are several places where hardcoded numbers are used in place of the easily-available constant, which is poor form. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: fx_init() needs preemption disabled while it plays with the FPU stateRusty Russell2007-10-131-0/+3
| | | | | | | | Now that kvm generally runs with preemption enabled, we need to protect the fpu intialization sequence. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Convert vm lock to a mutexShaohua Li2007-10-131-36/+33
| | | | | | | | This allows the kvm mmu to perform sleepy operations, such as memory allocation. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use the scheduler preemption notifiers to make kvm preemptibleAvi Kivity2007-10-131-6/+37
| | | | | | | | | | | | | | Current kvm disables preemption while the new virtualization registers are in use. This of course is not very good for latency sensitive workloads (one use of virtualization is to offload user interface and other latency insensitive stuff to a container, so that it is easier to analyze the remaining workload). This patch re-enables preemption for kvm; preemption is now only disabled when switching the registers in and out, and during the switch to guest mode and back. Contains fixes from Shaohua Li <shaohua.li@intel.com>. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add hypercall nr to kvm_runJeff Dike2007-10-131-0/+1
| | | | | | | | Add the hypercall number to kvm_run and initialize it. This changes the ABI, but as this particular ABI was unusable before this no users are affected. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Dynamically allocate vcpusRusty Russell2007-10-131-95/+103
| | | | | | | | | | This patch converts the vcpus array in "struct kvm" to a pointer array, and changes the "vcpu_create" and "vcpu_setup" hooks into one "vcpu_create" call which does the allocation and initialization of the vcpu (calling back into the kvm_vcpu_init core helper). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove arch specific components from the general codeGregory Haskins2007-10-131-21/+5
| | | | | | | | struct kvm_vcpu has vmx-specific members; remove them to a private structure. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: load_pdptrs() cleanupsRusty Russell2007-10-131-10/+12
| | | | | | | | | | | | | | | | | | load_pdptrs can be handed an invalid cr3, and it should not oops. This can happen because we injected #gp in set_cr3() after we set vcpu->cr3 to the invalid value, or from kvm_vcpu_ioctl_set_sregs(), or memory configuration changes after the guest did set_cr3(). We should also copy the pdpte array once, before checking and assigning, otherwise an SMP guest can potentially alter the values between the check and the set. Finally one nitpick: ret = 1 should be done as late as possible: this allows GCC to check for unset "ret" should the function change in future. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move gfn_to_page out of kmap/unmap pairsShaohua Li2007-10-131-4/+3
| | | | | | | gfn_to_page might sleep with swap support. Move it out of the kmap calls. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Return if the pdptrs are invalid when the guest turns on PAE.Rusty Russell2007-10-131-0/+1
| | | | | | | Don't fall through and turn on PAE in this case. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use standard CR8 flags, and fix TPR definitionRusty Russell2007-10-131-2/+2
| | | | | | | | | Intel manual (and KVM definition) say the TPR is 4 bits wide. Also fix CR8_RESEVED_BITS typo. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.Jeff Dike2007-10-131-1/+1
| | | | | Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed headerRusty Russell2007-10-131-1/+1
| | | | | | | | | Creating one's own BITMAP macro seems suboptimal: if we use manual arithmetic in the one place exposed to userspace, we can use standard macros elsewhere. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
OpenPOWER on IntegriCloud