op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: MMU: mmio page fault support	Xiao Guangrong	2011-07-24	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The idea is from Avi: \| We could cache the result of a miss in an spte by using a reserved bit, and \| checking the page fault error code (or seeing if we get an ept violation or \| ept misconfiguration), so if we get repeated mmio on a page, we don't need to \| search the slot list/tree. \| (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: filter out the mmio pfn from the fault pfn	Xiao Guangrong	2011-07-24	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \|	If the page fault is caused by mmio, the gfn can not be found in memslots, and 'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify the mmio page fault. And, to clarify the meaning of mmio pfn, we return fault page instead of bad page when the gfn is not allowd to prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: introduce kvm_read_guest_cached	Gleb Natapov	2011-07-12	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce kvm_read_guest_cached() function in addition to write one we already have. [ by glauber: export function signature in kvm header ] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric Munson <emunson@mgebm.net> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK	Alexander Graf	2011-07-12	1	-1/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KVM has an ioctl to define which signal mask should be used while running inside VCPU_RUN. At least for big endian systems, this mask is different on 32-bit and 64-bit systems (though the size is identical). Add a compat wrapper that converts the mask to whatever the kernel accepts, allowing 32-bit kvm user space to set signal masks. This patch fixes qemu with --enable-io-thread on ppc64 hosts when running 32-bit user land. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Clean up error handling during VCPU creation	Jan Kiszka	2011-07-12	1	-5/+6
\| \| \| \| \| \| \| \| \| \|	So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if it fails. Move this confusing resonsibility back into the hands of kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected, all other archs cannot fail. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: use __copy_to_user/__clear_user to write guest page	Xiao Guangrong	2011-07-12	1	-2/+2
\| \| \| \| \| \| \| \|	Simply use __copy_to_user/__clear_user to write guest page since we have already verified the user address when the memslot is set Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Initialize kvm before registering the mmu notifier	Mike Waychison	2011-06-06	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It doesn't make sense to ever see a half-initialized kvm structure on mmu notifier callbacks. Previously, 85722cda changed the ordering to ensure that the mmu_lock was initialized before mmu notifier registration, but there is still a race where the mmu notifier could come in and try accessing other portions of struct kvm before they are intialized. Solve this by moving the mmu notifier registration to occur after the structure is completely initialized. Google-Bug-Id: 452199 Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: add missing void __user * cast to access_ok() call	Heiko Carstens	2011-05-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fa3d315a "KVM: Validate userspace_addr of memslot when registered" introduced this new warning onn s390: kvm_main.c: In function '__kvm_set_memory_region': kvm_main.c:654:7: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast arch/s390/include/asm/uaccess.h:53:19: note: expected 'const void *' but argument is of type '__u64' Add the missing cast to get rid of it again... Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Fix kvm mmu_notifier initialization order	OGAWA Hirofumi	2011-05-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Like the following, mmu_notifier can be called after registering immediately. So, kvm have to initialize kvm->mmu_lock before it. BUG: spinlock bad magic on CPU#0, kswapd0/342 lock: ffff8800af8c4000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 Pid: 342, comm: kswapd0 Not tainted 2.6.39-rc5+ #1 Call Trace: [<ffffffff8118ce61>] spin_bug+0x9c/0xa3 [<ffffffff8118ce91>] do_raw_spin_lock+0x29/0x13c [<ffffffff81024923>] ? flush_tlb_others_ipi+0xaf/0xfd [<ffffffff812e22f3>] _raw_spin_lock+0x9/0xb [<ffffffffa0582325>] kvm_mmu_notifier_clear_flush_young+0x2c/0x66 [kvm] [<ffffffff810d3ff3>] __mmu_notifier_clear_flush_young+0x2b/0x57 [<ffffffff810c8761>] page_referenced_one+0x88/0xea [<ffffffff810c89bf>] page_referenced+0x1fc/0x256 [<ffffffff810b2771>] shrink_page_list+0x187/0x53a [<ffffffff810b2ed7>] shrink_inactive_list+0x1e0/0x33d [<ffffffff810acf95>] ? determine_dirtyable_memory+0x15/0x27 [<ffffffff812e90ee>] ? call_function_single_interrupt+0xe/0x20 [<ffffffff810b3356>] shrink_zone+0x322/0x3de [<ffffffff810a9587>] ? zone_watermark_ok_safe+0xe2/0xf1 [<ffffffff810b3928>] kswapd+0x516/0x818 [<ffffffff810b3412>] ? shrink_zone+0x3de/0x3de [<ffffffff81053d17>] kthread+0x7d/0x85 [<ffffffff812e9394>] kernel_thread_helper+0x4/0x10 [<ffffffff81053c9a>] ? __init_kthread_worker+0x37/0x37 [<ffffffff812e9390>] ? gs_change+0xb/0xb Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Validate userspace_addr of memslot when registered	Takuya Yoshikawa	2011-05-22	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This way, we can avoid checking the user space address many times when we read the guest memory. Although we can do the same for write if we check which slots are writable, we do not care write now: reading the guest memory happens more often than writing. [avi: change VERIFY_READ to VERIFY_WRITE] Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: cleanup memslot_id function	Xiao Guangrong	2011-05-11	1	-17/+0
\| \| \| \| \| \| \|	We can get memslot id from memslot->id directly Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Enable async page fault processing	Gleb Natapov	2011-04-06	1	-2/+21
\| \| \| \| \| \| \| \| \| \|	If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to avoid sleeping on IO. Check for hwpoison is done at the same time, otherwise check_user_page_hwpoison() will call GUP again and will put vcpu to sleep. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	Merge branch 'syscore' of ↵	Linus Torvalds	2011-03-25	1	-26/+8
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Introduce ARCH_NO_SYSDEV_OPS config option (v2) cpufreq: Use syscore_ops for boot CPU suspend/resume (v2) KVM: Use syscore_ops instead of sysdev class and sysdev PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev timekeeping: Use syscore_ops instead of sysdev class and sysdev x86: Use syscore_ops instead of sysdev classes and sysdevs
\| *	KVM: Use syscore_ops instead of sysdev class and sysdev	Rafael J. Wysocki	2011-03-23	1	-26/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KVM uses a sysdev class and a sysdev for executing kvm_suspend() after interrupts have been turned off on the boot CPU (during system suspend) and for executing kvm_resume() before turning on interrupts on the boot CPU (during system resume). However, since both of these functions ignore their arguments, the entire mechanism may be replaced with a struct syscore_ops object which is simpler. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Avi Kivity <avi@redhat.com>
* \|	kvm: use little-endian bitops	Akinobu Mita	2011-03-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	kvm: stop including asm-generic/bitops/le.h directly	Akinobu Mita	2011-03-23	1	-2/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	asm-generic/bitops/le.h is only intended to be included directly from asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h which implements generic ext2 or minix bit operations. This stops including asm-generic/bitops/le.h directly and use ext2 non-atomic bit operations instead. It seems odd to use ext2_set_bit() on kvm, but it will replaced with __set_bit_le() after introducing little endian bit operations for all architectures. This indirect step is necessary to maintain bisectability for some architectures which have their own little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	KVM: Convert kvm_lock to raw_spinlock	Jan Kiszka	2011-03-17	1	-18/+18
\| \| \| \| \| \| \| \|	Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: use yield_to instead of sleep in kvm_vcpu_on_spin	Rik van Riel	2011-03-17	1	-10/+47
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic slowdowns of certain workloads, we instead use yield_to to get another VCPU in the same KVM guest to run sooner. This seems to give a 10-15% speedup in certain workloads. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: keep track of which task is running a KVM vcpu	Rik van Riel	2011-03-17	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Keep track of which task is running a KVM vcpu. This helps us figure out later what task to wake up if we want to boost a vcpu that got preempted. Unfortunately there are no guarantees that the same task always keeps the same vcpu, so we can only track the task across a single "run" of the vcpu. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Replace is_hwpoison_address with __get_user_pages	Huang Ying	2011-03-17	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is_hwpoison_address only checks whether the page table entry is hwpoisoned, regardless the memory page mapped. While __get_user_pages will check both. QEMU will clear the poisoned page table entry (via unmap/map) to make it possible to allocate a new memory page for the virtual address across guest rebooting. But it is also possible that the underlying memory page is kept poisoned even after the corresponding page table entry is cleared, that is, a new memory page can not be allocated. __get_user_pages can catch these situations. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: make make_all_cpus_request() lockless	Xiao Guangrong	2011-03-17	1	-6/+3
\| \| \| \| \| \| \| \| \|	Now, we have 'vcpu->mode' to judge whether need to send ipi to other cpus, this way is very exact, so checking request bit is needless, then we can drop the spinlock let it's collateral Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Add "exiting guest mode" state	Xiao Guangrong	2011-03-17	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: fix build warning within __kvm_set_memory_region() on s390	Heiko Carstens	2011-03-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of this warning: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used The only caller of the function is within a !CONFIG_S390 section, so add the same ifdef around kvm_create_dirty_bitmap() as well. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: MMU: Don't flush shadow when enabling dirty tracking	Avi Kivity	2011-03-17	1	-6/+1
\| \| \| \| \| \| \|	Instead, drop large mappings, which were the reason we dropped shadow. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	thp: add compound_trans_head() helper	Andrea Arcangeli	2011-01-13	1	-24/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Cleanup some code with common compound_trans_head helper. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	thp: mmu_notifier_test_young	Andrea Arcangeli	2011-01-13	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	thp: kvm mmu transparent hugepage support	Andrea Arcangeli	2011-01-13	1	-2/+30
\| \| \| \| \| \| \| \| \| \| \|	This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	KVM: Don't spin on virt instruction faults during reboot	Avi Kivity	2011-01-12	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since vmx blocks INIT signals, we disable virtualization extensions during reboot. This leads to virtualization instructions faulting; we trap these faults and spin while the reboot continues. Unfortunately spinning on a non-preemptible kernel may block a task that reboot depends on; this causes the reboot to hang. Fix by skipping over the instruction and hoping for the best. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: delay flush all tlbs on sync_page path	Xiao Guangrong	2011-01-12	1	-1/+6
\| \| \| \| \| \| \| \| \| \|	Quote from Avi: \| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere \| that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page() \| can consult the bit and force a flush if set. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: rename hardware_[dis\|en]able() to *_nolock() and add locking wrappers	Takuya Yoshikawa	2011-01-12	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \|	The naming convension of hardware_[dis\|en]able family is little bit confusing because only hardware_[dis\|en]able_all are using _nolock suffix. Renaming current hardware_[dis\|en]able() to *_nolock() and using hardware_[dis\|en]able() as wrapper functions which take kvm_lock for them reduces extra confusion. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: take kvm_lock for hardware_disable() during cpu hotplug	Takuya Yoshikawa	2011-01-12	1	-0/+2
\| \| \| \| \| \| \| \|	In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock. This patch adds missing protection for CPU_DYING case. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Clean up vm creation and release	Jan Kiszka	2011-01-12	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \|	IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Refactor srcu struct release on early errors	Jan Kiszka	2011-01-12	1	-8/+6
\| \| \| \| \|	Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: replace vmalloc and memset with vzalloc	Takuya Yoshikawa	2011-01-12	1	-7/+2
\| \| \| \| \| \| \| \|	Let's use newly introduced vzalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: get rid of warning within kvm_dev_ioctl_create_vm	Heiko Carstens	2011-01-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Fixes this: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm': arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: add cast within kvm_clear_guest_page to fix warning	Heiko Carstens	2011-01-12	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes this: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page': arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: use kmalloc() for small dirty bitmaps	Takuya Yoshikawa	2011-01-12	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we are using vmalloc() for all dirty bitmaps even if they are small enough, say less than K bytes. We use kmalloc() if dirty bitmap size is less than or equal to PAGE_SIZE so that we can avoid vmalloc area usage for VGA. This will also make the logging start/stop faster. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: pre-allocate one more dirty bitmap to avoid vmalloc()	Takuya Yoshikawa	2011-01-12	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by vmalloc() which will be used in the next logging and this has been causing bad effect to VGA and live-migration: vmalloc() consumes extra systime, triggers tlb flush, etc. This patch resolves this issue by pre-allocating one more bitmap and switching between two bitmaps during dirty logging. Performance improvement: I measured performance for the case of VGA update by trace-cmd. The result was 1.5 times faster than the original one. In the case of live migration, the improvement ratio depends on the workload and the guest memory size. In general, the larger the memory size is the more benefits we get. Note: This does not change other architectures's logic but the allocation size becomes twice. This will increase the actual memory consumption only when the new size changes the number of pages allocated by vmalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: introduce wrapper functions for creating/destroying dirty bitmaps	Takuya Yoshikawa	2011-01-12	1	-7/+23
\| \| \| \| \| \| \| \|	This makes it easy to change the way of allocating/freeing dirty bitmaps. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: x86: trace "exit to userspace" event	Gleb Natapov	2011-01-12	1	-0/+1
\| \| \| \| \| \| \|	Add tracepoint for userspace exit. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: propagate fault r/w information to gup(), allow read-only memory	Marcelo Tosatti	2011-01-12	1	-10/+41
\| \| \| \| \| \| \| \|	As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: improve hva_to_pfn() readability	Gleb Natapov	2011-01-12	1	-13/+17
\| \| \| \| \| \| \| \|	Improve vma handling code readability in hva_to_pfn() and fix async pf handling code to properly check vma returned by find_vma(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Add memory slot versioning and use it to provide fast guest write interface	Gleb Natapov	2011-01-12	1	-12/+63
\| \| \| \| \| \| \| \| \| \| \|	Keep track of memslots changes by keeping generation number in memslots structure. Provide kvm_write_guest_cached() function that skips gfn_to_hva() translation if memslots was not changed since previous invocation. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Halt vcpu if page it tries to access is swapped out	Gleb Natapov	2011-01-12	1	-11/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. [avi: remove call to get_user_pages_noio(), nacked by Linus; this makes everything synchrnous again] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2010-10-24	1	-18/+66
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...
\| *	KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages	Jan Kiszka	2010-10-24	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We also have to call kvm_iommu_map_pages for CONFIG_AMD_IOMMU. So drop the dependency on Intel IOMMU, kvm_iommu_map_pages will be a nop anyway if CONFIG_IOMMU_API is not defined. KVM-Stable-Tag. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: fix typo in copyright notice	Nicolas Kaiser	2010-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: cpu_relax() during spin waiting for reboot	Avi Kivity	2010-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It doesn't really matter, but if we spin, we should spin in a more relaxed manner. This way, if something goes wrong at least it won't contribute to global warming. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
\| *	KVM: MMU: rewrite audit_mappings_page() function	Xiao Guangrong	2010-10-24	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection). This patch fix it by: - introduce gfn_to_pfn_atomic instead of gfn_to_pfn - get the mapping gfn from kvm_mmu_page_get_gfn() And it adds 'notrap' ptes check in unsync/direct sps Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: MMU: introduce gfn_to_page_many_atomic() function	Xiao Guangrong	2010-10-24	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce this function to get consecutive gfn's pages, it can reduce gup's overload, used by later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>