op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2010-10-24	1	-3/+19
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...
\| *	KVM: Fix signature of kvm_iommu_map_pages stub	Jan Kiszka	2010-10-24	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Breaks otherwise if CONFIG_IOMMU_API is not set. KVM-Stable-Tag. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: x86: Rename timer function	Zachary Amsden	2010-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This just changes some names to better reflect the usage they will be given. Separated out to keep confusion to a minimum. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: Check for pending events before attempting injection	Avi Kivity	2010-10-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of blindly attempting to inject an event before each guest entry, check for a possible event first in vcpu->requests. Sites that can trigger event injection are modified to set KVM_REQ_EVENT: - interrupt, nmi window opening - ppr updates - i8259 output changes - local apic irr changes - rflags updates - gif flag set - event set on exit This improves non-injecting entry performance, and sets the stage for non-atomic injection. Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: MMU: Add infrastructure for two-level page walker	Joerg Roedel	2010-10-24	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: MMU: rewrite audit_mappings_page() function	Xiao Guangrong	2010-10-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection). This patch fix it by: - introduce gfn_to_pfn_atomic instead of gfn_to_pfn - get the mapping gfn from kvm_mmu_page_get_gfn() And it adds 'notrap' ptes check in unsync/direct sps Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: MMU: introduce gfn_to_page_many_atomic() function	Xiao Guangrong	2010-10-24	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce this function to get consecutive gfn's pages, it can reduce gup's overload, used by later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: MMU: introduce hva_to_pfn_atomic function	Xiao Guangrong	2010-10-24	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce hva_to_pfn_atomic(), it's the fast path and can used in atomic context, the later patch will use it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* \|	kvm: add __rcu annotations	Arnd Bergmann	2010-08-19	1	-1/+1
\|/ \| \| \| \| \| \| \|	Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
*	KVM: Convert mask notifiers to use irqchip/pin instead of gsi	Gleb Natapov	2010-08-02	1	-1/+2
\| \| \| \| \| \| \| \| \|	Devices register mask notifier using gsi, but irqchip knows about irqchip/pin, so conversion from irqchip/pin to gsi should be done before looking for mask notifier to call. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Return EFAULT from kvm ioctl when guest accesses bad area	Gleb Natapov	2010-08-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Currently if guest access address that belongs to memory slot but is not backed up by page or page is read only KVM treats it like MMIO access. Remove that capability. It was never part of the interface and should not be relied upon. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Keep slot ID in memory slot structure	Avi Kivity	2010-08-01	1	-0/+1
\| \| \| \| \| \| \|	May be used for distinguishing between internal and user slots, or for sorting slots in size order. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Reduce atomic operations on vcpu->requests	Avi Kivity	2010-08-01	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	Usually the vcpu->requests bitmap is sparse, so a test_and_clear_bit() for each request generates a large number of unneeded atomics if a bit is set. Replace with a separate test/clear sequence. This is safe since there is no clear_bit() outside the vcpu thread. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Add mini-API for vcpu->requests	Avi Kivity	2010-08-01	1	-0/+15
\| \| \| \| \| \|	Makes it a little more readable and hackable. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Remove memory alias support	Avi Kivity	2010-08-01	1	-6/+0
\| \| \| \| \| \| \|	As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Enable XSAVE/XRSTOR for guest	Dexuan Cui	2010-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enable guest to use XSAVE/XRSTOR instructions. We assume that host_xcr0 would use all possible bits that OS supported. And we loaded xcr0 in the same way we handled fpu - do it as late as we can. Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Get rid of KVM_REQ_KICK	Avi Kivity	2010-08-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu->requests to fail all the time, triggering tons of atomic operations. Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages	Huang Ying	2010-08-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Let vcpu structure alignment be determined at runtime	Avi Kivity	2010-05-19	1	-1/+1
\| \| \| \| \| \| \|	vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Get rid of dead function gva_to_page()	Gui Jianfeng	2010-05-17	1	-1/+0
\| \| \| \| \| \| \|	Nobody use gva_to_page() anymore, get rid of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: use the correct RCU API for PROVE_RCU=y	Lai Jiangshan	2010-05-17	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm] [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm] [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [<ffffffff810a8692>] ? unlock_page+0x27/0x2c [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm] [<ffffffff81060cfa>] ? up_read+0x23/0x3d [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a [<ffffffff810021db>] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: limit the number of pages per memory slot	Takuya Yoshikawa	2010-05-17	1	-0/+6
\| \| \| \| \| \| \| \|	This patch limits the number of pages per memory slot to make us free from extra care about type issues. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Increase NR_IOBUS_DEVS limit to 200	Sridhar Samudrala	2010-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This patch increases the current hardcoded limit of NR_IOBUS_DEVS from 6 to 200. We are hitting this limit when creating a guest with more than 1 virtio-net device using vhost-net backend. Each virtio-net device requires 2 such devices to service notifications from rx/tx queues. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: fix the handling of dirty bitmaps to avoid overflows	Takuya Yoshikawa	2010-04-20	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Int is not long enough to store the size of a dirty bitmap. This patch fixes this problem with the introduction of a wrapper function to calculate the sizes of dirty bitmaps. Note: in mark_page_dirty(), we have to consider the fact that __set_bit() takes the offset as int, not long. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Convert kvm->requests_lock to raw_spinlock_t	Avi Kivity	2010-03-01	1	-1/+1
\| \| \| \| \| \| \| \|	The code relies on kvm->requests_lock inhibiting preemption. Noted by Jan Kiszka. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Introduce kvm_host_page_size	Joerg Roedel	2010-03-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This patch introduces a generic function to find out the host page size for a given gfn. This function is needed by the kvm iommu code. This patch also simplifies the x86 host_mapping_level function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: enable PCI multiple-segments for pass-through device	Zhai, Edwin	2010-03-01	1	-0/+1
\| \| \| \| \| \| \| \| \|	Enable optional parameter (default 0) - PCI segment (or domain) besides BDF, when assigning PCI device to guest. Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Lazify fpu activation and deactivation	Avi Kivity	2010-03-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep it loaded until the next heavyweight exit (where we are forced to unload it). This reduces unnecessary exits. We also defer fpu activation on clts; while clts signals the intent to use the fpu, we can't be sure the guest will actually use it. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: convert slots_lock to a mutex	Marcelo Tosatti	2010-03-01	1	-1/+1
\| \| \| \|	Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: switch vcpu context to use SRCU	Marcelo Tosatti	2010-03-01	1	-0/+2
\| \| \| \|	Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: convert io_bus to SRCU	Marcelo Tosatti	2010-03-01	1	-14/+13
\| \| \| \|	Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: x86: switch kvm_set_memory_alias to SRCU update	Marcelo Tosatti	2010-03-01	1	-0/+6
\| \| \| \| \| \|	Using a similar two-step procedure as for memslots. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update	Marcelo Tosatti	2010-03-01	1	-5/+2
\| \| \| \| \| \| \| \| \| \|	Use two steps for memslot deletion: mark the slot invalid (which stops instantiation of new shadow pages for that slot, but allows destruction), then instantiate the new empty slot. Also simplifies kvm_handle_hva locking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages	Marcelo Tosatti	2010-03-01	1	-2/+1
\| \| \| \| \| \| \|	So its possible to iommu map a memslot before making it visible to kvm. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: introduce gfn_to_pfn_memslot	Marcelo Tosatti	2010-03-01	1	-0/+2
\| \| \| \| \| \| \| \|	Which takes a memslot pointer instead of using kvm->memslots. To be used by SRCU convertion later. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: split kvm_arch_set_memory_region into prepare and commit	Marcelo Tosatti	2010-03-01	1	-1/+6
\| \| \| \| \| \|	Required for SRCU convertion later. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: modify memslots layout in struct kvm	Marcelo Tosatti	2010-03-01	1	-4/+8
\| \| \| \| \| \| \| \| \|	Have a pointer to an allocated region inside struct kvm. [alex: fix ppc book 3s] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: introduce kvm_vcpu_on_spin	Zhai, Edwin	2009-12-03	1	-0/+1
\| \| \| \| \| \| \| \|	Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing once the cpu detects pause-based looping. Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Activate Virtualization On Demand	Alexander Graf	2009-12-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Move assigned device code to own file	Avi Kivity	2009-12-03	1	-0/+17
\| \| \| \|	Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Move irq ack notifier list to arch independent code	Gleb Natapov	2009-12-03	1	-0/+1
\| \| \| \| \| \| \|	Mask irq notifier list is already there. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Maintain back mapping from irqchip/pin to gsi	Gleb Natapov	2009-12-03	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Maintain back mapping from irqchip/pin to gsi to speedup interrupt acknowledgment notifications. [avi: build fix on non-x86/ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Change irq routing table to use gsi indexed array	Gleb Natapov	2009-12-03	1	-3/+18
\| \| \| \| \| \| \| \|	Use gsi indexed array instead of scanning all entries on each interrupt injection. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Move irq sharing information to irqchip level	Gleb Natapov	2009-12-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This removes assumptions that max GSIs is smaller than number of pins. Sharing is tracked on pin level not GSI level. [avi: no PIC on ia64] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	tracing: Remove markers	Christoph Hellwig	2009-09-18	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Now that the last users of markers have migrated to the event tracer we can kill off the (now orphan) support code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090917173527.GA1699@lst.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	KVM: Reduce runnability interface with arch support code	Gleb Natapov	2009-09-10	1	-2/+0
\| \| \| \| \| \| \| \| \|	Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Move kvm_cpu_get_interrupt() declaration to x86 code	Gleb Natapov	2009-09-10	1	-1/+0
\| \| \| \| \| \| \|	It is implemented only by x86. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: add ioeventfd support	Gregory Haskins	2009-09-10	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via ioeventfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt ioeventfd-mmio: 200100 iops, 5.00us rtt ioeventfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt ioeventfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: make io_bus interface more robust	Gregory Haskins	2009-09-10	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: remove in_range from io devices	Michael S. Tsirkin	2009-09-10	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write functions. in_range now becomes unused so it is removed from device ops in favor of read/write callbacks performing range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>