op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: x86 emulator: fix memory access during x86 emulation	Gleb Natapov	2010-03-01	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: rename is_writeble_pte() to is_writable_pte()	Takuya Yoshikawa	2010-03-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two spellings of "writable" in arch/x86/kvm/mmu.c and paging_tmpl.h . This patch renames is_writeble_pte() to is_writable_pte() and makes grepping easy. New name is consistent with the definition of itself: return pte & PT_WRITABLE_MASK; Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: bail out pagewalk on kvm_read_guest error	Marcelo Tosatti	2010-01-25	1	-1/+3
\| \| \| \| \| \| \| \|	Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: MMU: remove prefault from invlpg handler	Marcelo Tosatti	2009-12-27	1	-18/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The invlpg prefault optimization breaks Windows 2008 R2 occasionally. The visible effect is that the invlpg handler instantiates a pte which is, microseconds later, written with a different gfn by another vcpu. The OS could have other mechanisms to prevent a present translation from being used, which the hypervisor is unaware of. While the documentation states that the cpu is at liberty to prefetch tlb entries, it looks like this is not heeded, so remove tlb prefetch from invlpg. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: update invlpg handler comment	Marcelo Tosatti	2009-12-03	1	-1/+0
\| \| \| \| \| \| \| \| \|	Large page translations are always synchronized (either in level 3 or level 2), so its not necessary to properly deal with them in the invlpg handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptes	Izik Eidus	2009-10-04	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \|	this flag notify that the host physical page we are pointing to from the spte is write protected, and therefore we cant change its access to be write unless we run get_user_pages(write = 1). (this is needed for change_pte support in kvm) Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: MMU: shadow support for 1gb pages	Joerg Roedel	2009-09-10	1	-23/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for shadow paging to the 1gb page table code in KVM. With this code the guest can use 1gb pages even if the host does not support them. [ Marcelo: fix shadow page collision on pmd level if a guest 1gb page is mapped with 4kb ptes on host level ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: make page walker aware of mapping levels	Joerg Roedel	2009-09-10	1	-24/+28
\| \| \| \| \| \| \| \| \| \| \|	The page walker may be used with nested paging too when accessing mmio areas. Make it support the additional page-level too. [ Marcelo: fix reserved bit check for 1gb pte ] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: make direct mapping paths aware of mapping levels	Joerg Roedel	2009-09-10	1	-3/+3
\| \| \| \| \|	Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: rename is_largepage_backed to mapping_level	Joerg Roedel	2009-09-10	1	-2/+2
\| \| \| \| \| \| \| \|	With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Trace guest pagetable walker	Avi Kivity	2009-09-10	1	-3/+8
\| \| \| \|	Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Prepare memslot data structures for multiple hugepage sizes	Joerg Roedel	2009-09-10	1	-1/+2
\| \| \| \| \| \| \|	[avi: fix build on non-x86] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: s/shadow_pte/spte/	Avi Kivity	2009-09-10	1	-8/+8
\| \| \| \| \| \| \| \| \|	We use shadow_pte and spte inconsistently, switch to the shorter spelling. Rename set_shadow_pte() to __set_spte() to avoid a conflict with the existing set_spte(), and to indicate its lowlevelness. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte	Avi Kivity	2009-09-10	1	-11/+11
\| \| \| \| \| \| \| \| \|	Since the guest and host ptes can have wildly different format, adjust the pte accessor names to indicate on which type of pte they operate on. No functional changes. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Cache pdptrs	Avi Kivity	2009-09-10	1	-1/+1
\| \| \| \| \| \| \|	Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx, guest memory access on svm) extract them on demand. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: shut up uninit compiler warning in paging_tmpl.h	Jaswinder Singh Rajput	2009-06-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dixes compilation warning: CC arch/x86/kernel/io_delay.o arch/x86/kvm/paging_tmpl.h: In function ‘paging64_fetch’: arch/x86/kvm/paging_tmpl.h:279: warning: ‘sptep’ may be used uninitialized in this function arch/x86/kvm/paging_tmpl.h: In function ‘paging32_fetch’: arch/x86/kvm/paging_tmpl.h:279: warning: ‘sptep’ may be used uninitialized in this function warning is bogus (always have a least one level), but need to shut the compiler up. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: remove global page optimization logic	Marcelo Tosatti	2009-06-10	1	-4/+2
\| \| \| \| \| \| \| \|	Complexity to fix it not worthwhile the gains, as discussed in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Emulate #PF error code of reserved bits violation	Dong, Eddie	2009-06-10	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	Detect, indicate, and propagate page faults where reserved bits are set. Take care to handle the different paging modes, each of which has different sets of reserved bits. [avi: fix pte reserved bits for efer.nxe=0] Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Fix comment in page_fault()	Eddie Dong	2009-06-10	1	-1/+1
\| \| \| \| \| \| \| \|	The original one is for the code before refactoring. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: remove call to kvm_mmu_pte_write from walk_addr	Joerg Roedel	2009-06-10	1	-1/+0
\| \| \| \| \| \| \| \|	There is no reason to update the shadow pte here because the guest pte is only changed to dirty state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
*	KVM: Fix missing smp tlb flush in invlpg	Andrea Arcangeli	2009-03-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When kvm emulates an invlpg instruction, it can drop a shadow pte, but leaves the guest tlbs intact. This can cause memory corruption when swapping out. Without this the other cpu can still write to a freed host physical page. tlb smp flush must happen if rmap_remove is called always before mmu_lock is released because the VM will take the mmu_lock before it can finally add the page to the freelist after swapout. mmu notifier makes it safe to flush the tlb after freeing the page (otherwise it would never be safe) so we can do a single flush for multiple sptes invalidated. Cc: stable@kernel.org Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Fix another largepage memory leak	Joerg Roedel	2009-03-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In the paging_fetch function rmap_remove is called after setting a large pte to non-present. This causes rmap_remove to not drop the reference to the large page. The result is a memory leak of that page. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Rename "metaphysical" attribute to "direct"	Avi Kivity	2009-03-24	1	-6/+6
\| \| \| \| \| \| \|	This actually describes what is going on, rather than alerting the reader that something strange is going on. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Replace walk_shadow() by for_each_shadow_entry() in invlpg()	Avi Kivity	2009-03-24	1	-49/+32
\| \| \| \|	Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Replace walk_shadow() by for_each_shadow_entry() in fetch()	Avi Kivity	2009-03-24	1	-70/+58
\| \| \| \| \| \| \|	Effectively reverting to the pre walk_shadow() version -- but now with the reusable for_each(). Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: handle large host sptes on invlpg/resync	Marcelo Tosatti	2008-12-31	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \|	The invlpg and sync walkers lack knowledge of large host sptes, descending to non-existant pagetable level. Stop at directory level in such case. Fixes SMP Windows XP with hugepages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: prepopulate the shadow on invlpg	Marcelo Tosatti	2008-12-31	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \|	If the guest executes invlpg, peek into the pagetable and attempt to prepopulate the shadow entry. Also stop dirty fault updates from interfering with the fork detector. 2% improvement on RHEL3/AIM7. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: skip global pgtables on sync due to cr3 switch	Marcelo Tosatti	2008-12-31	1	-4/+6
\| \| \| \| \| \| \| \| \|	Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is important for Linux 32-bit guests with PAE, where the kmap page is marked as global. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: avoid creation of unreachable pages in the shadow	Marcelo Tosatti	2008-11-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for a shadow page to have a parent link pointing to a freed page. When zapping a high level table, kvm_mmu_page_unlink_children fails to remove the parent_pte link. For that to happen, the child must be unreachable via the shadow tree, which can happen in shadow_walk_entry if the guest pte was modified in between walk() and fetch(). Remove the parent pte reference in such case. Possible cause for oops in bug #2217430. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: out of sync shadow core	Marcelo Tosatti	2008-10-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Allow guest pagetables to go out of sync. Instead of emulating write accesses to guest pagetables, or unshadowing them, we un-write-protect the page table and allow the guest to modify it at will. We rely on invlpg executions to synchronize individual ptes, and will synchronize the entire pagetable on tlb flushes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86: trap invlpg	Marcelo Tosatti	2008-10-15	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	With pages out of sync invlpg needs to be trapped. For now simply nuke the entry. Untested on AMD. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: mode specific sync_page	Marcelo Tosatti	2008-10-15	1	-0/+54
\| \| \| \| \| \| \| \|	Examine guest pagetable and bring the shadow back in sync. Caller is responsible for local TLB flush before re-entering guest mode. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: flush remote TLBs on large->normal entry overwrite	Marcelo Tosatti	2008-10-15	1	-1/+4
\| \| \| \| \| \| \| \|	It is necessary to flush all TLB's when a large spte entry is overwritten with a normal page directory pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: switch to get_user_pages_fast	Marcelo Tosatti	2008-10-15	1	-7/+1
\| \| \| \| \| \| \| \| \|	Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7% faster on VMX. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Modify kvm_shadow_walk.entry to accept u64 addr	Sheng Yang	2008-10-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	EPT is 4 level by default in 32pae(48 bits), but the addr parameter of kvm_shadow_walk->entry() only accept unsigned long as virtual address, which is 32bit in 32pae. This result in SHADOW_PT_INDEX() overflow when try to fetch level 4 index. Fix it by extend kvm_shadow_walk->entry() to accept 64bit addr in parameter. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Convert the paging mode shadow walk to use the generic walker	Avi Kivity	2008-10-15	1	-72/+86
\| \| \| \|	Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Move SHADOW_PT_INDEX to mmu.c	Avi Kivity	2008-10-15	1	-3/+0
\| \| \| \| \| \|	It is not specific to the paging mode, so can be made global (and reusable). Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Fix torn shadow pte	Avi Kivity	2008-08-25	1	-1/+1
\| \| \| \| \| \| \|	The shadow code assigns a pte directly in one place, which is nonatomic on i386 can can cause random memory references. Fix by using an atomic setter. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: Synchronize guest physical memory map to host virtual memory map	Andrea Arcangeli	2008-07-29	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	Synchronize changes to host virtual addresses which are part of a KVM memory slot to the KVM shadow mmu. This allows pte operations like swapping, page migration, and madvise() to transparently work with KVM. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Optimize prefetch_page()	Avi Kivity	2008-07-20	1	-13/+15
\| \| \| \| \| \| \|	Instead of reading each pte individually, read 256 bytes worth of ptes and batch process them. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Fix printk() format string	Avi Kivity	2008-06-06	1	-1/+1
\| \| \| \|	Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Don't assume struct page for x86	Anthony Liguori	2008-04-27	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: unify slots_lock usage	Marcelo Tosatti	2008-04-27	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Set the accessed bit on non-speculative shadow ptes	Avi Kivity	2008-04-27	1	-2/+2
\| \| \| \| \| \| \| \| \|	If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: replace remaining __FUNCTION__ occurances	Harvey Harrison	2008-04-27	1	-7/+7
\| \| \| \| \| \| \|	__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: large page support	Marcelo Tosatti	2008-04-27	1	-6/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Decouple mmio from shadow page tables	Avi Kivity	2008-04-27	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently an mmio guest pte is encoded in the shadow pagetable as a not-present trapping pte, with the SHADOW_IO_MARK bit set. However nothing is ever done with this information, so maintaining it is a useless complication. This patch moves the check for mmio to before shadow ptes are instantiated, so the shadow code is never invoked for ptes that reference mmio. The code is simpler, and with future work, can be made to handle mmio concurrently. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Update shadow ptes on partial guest pte writes	Dong, Eddie	2008-04-27	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A guest partial guest pte write will leave shadow_trap_nonpresent_pte in spte, which generates a vmexit at the next guest access through that pte. This patch improves this by reading the full guest pte in advance and thus being able to update the spte and eliminate the vmexit. This helps pae guests which use two 32-bit writes to set a single 64-bit pte. [truncation fix by Eric] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: MMU: Fix race when instantiating a shadow pte	Avi Kivity	2008-03-04	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For improved concurrency, the guest walk is performed concurrently with other vcpus. This means that we need to revalidate the guest ptes once we have write-protected the guest page tables, at which point they can no longer be modified. The current code attempts to avoid this check if the shadow page table is not new, on the assumption that if it has existed before, the guest could not have modified the pte without the shadow lock. However the assumption is incorrect, as the racing vcpu could have modified the pte, then instantiated the shadow page, before our vcpu regains control: vcpu0 vcpu1 fault walk pte modify pte fault in same pagetable instantiate shadow page lookup shadow page conclude it is old instantiate spte based on stale guest pte We could do something clever with generation counters, but a test run by Marcelo suggests this is unnecessary and we can just do the revalidation unconditionally. The pte will be in the processor cache and the check can be quite fast. Signed-off-by: Avi Kivity <avi@qumranet.com>
*	KVM: make MMU_DEBUG compile again	Marcelo Tosatti	2008-03-04	1	-1/+1
\| \| \| \| \| \| \|	the cr3 variable is now inside the vcpu->arch structure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>