summaryrefslogtreecommitdiffstats
path: root/drivers/kvm/mmu.c
Commit message (Collapse)AuthorAgeFilesLines
* KVM: MMU: More struct kvm_vcpu -> struct kvm cleanupsAnthony Liguori2008-01-301-13/+13
| | | | | | | | | This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does not depend on the VCPU state unlike GVA to GPA so there's no need to pass in the kvm_vcpu. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Clean up MMU functions to take struct kvm when appropriateAnthony Liguori2008-01-301-9/+9
| | | | | | | | | | | | Some of the MMU functions take a struct kvm_vcpu even though they affect all VCPUs. This patch cleans up some of them to instead take a struct kvm. This makes things a bit more clear. The main thing that was confusing me was whether certain functions need to be called on all VCPUs. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: CodingStyle cleanupMike Day2008-01-301-4/+6
| | | | | Signed-off-by: Mike D. Day <ncmike@ncultra.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow dynamic allocation of the mmu shadow cache sizeIzik Eidus2008-01-301-2/+38
| | | | | | | The user is now able to set how many mmu pages will be allocated to the guest. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove the usage of page->private field by rmapIzik Eidus2008-01-301-52/+70
| | | | | | | | | | | | | When kvm uses user-allocated pages in the future for the guest, we won't be able to use page->private for rmap, since page->rmap is reserved for the filesystem. So we move the rmap base pointers to the memory slot. A side effect of this is that we need to store the gfn of each gpte in the shadow pages, since the memory slot is addressed by gfn, instead of hfn like struct page. Signed-off-by: Izik Eidus <izik@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Make flooding detection work when guest page faults are bypassedAvi Kivity2008-01-301-1/+20
| | | | | | | | | When we allow guest page faults to reach the guests directly, we lose the fault tracking which allows us to detect demand paging. So we provide an alternate mechnism by clearing the accessed bit when we set a pte, and checking it later to see if the guest actually used it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow not-present guest page faults to bypass kvmAvi Kivity2008-01-301-21/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | There are two classes of page faults trapped by kvm: - host page faults, where the fault is needed to allow kvm to install the shadow pte or update the guest accessed and dirty bits - guest page faults, where the guest has faulted and kvm simply injects the fault back into the guest to handle The second class, guest page faults, is pure overhead. We can eliminate some of it on vmx using the following evil trick: - when we set up a shadow page table entry, if the corresponding guest pte is not present, set up the shadow pte as not present - if the guest pte _is_ present, mark the shadow pte as present but also set one of the reserved bits in the shadow pte - tell the vmx hardware not to trap faults which have the present bit clear With this, normal page-not-present faults go directly to the guest, bypassing kvm entirely. Unfortunately, this trick only works on Intel hardware, as AMD lacks a way to discriminate among page faults based on error code. It is also a little risky since it uses reserved bits which might become unreserved in the future, so a module parameter is provided to disable it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Reset mmu context when entering real modeEddie Dong2007-10-221-0/+1
| | | | | | | | | | | | | Resetting an SMP guest will force AP enter real mode (RESET) with paging enabled in protected mode. While current enter_rmode() can only handle mode switch from nonpaging mode to real mode which leads to SMP reboot failure. Fix by reloading the mmu context on entering real mode. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()Izik Eidus2007-10-221-1/+1
| | | | | | | Setting shadow page table entry should be set atomicly using set_shadow_pte(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Don't do GFP_NOWAIT allocationsAvi Kivity2007-10-131-24/+10
| | | | | | | | | | | Before preempt notifiers, kvm needed to allocate memory with GFP_NOWAIT so as not to have to enable preemption and take a heavyweight exit. On oom, we'd fall back to a GFP_KERNEL allocation. With preemption notifiers, we can do a GFP_KERNEL allocation, and perform the heavyweight exit only if the kernel decides to put us to sleep. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Rename kvm_arch_ops to kvm_x86_opsChristian Ehrhardt2007-10-131-3/+3
| | | | | | | | This patch just renames the current (misnamed) _arch namings to _x86 to ensure better readability when a real arch layer takes place. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Convert vm lock to a mutexShaohua Li2007-10-131-5/+4
| | | | | | | | This allows the kvm mmu to perform sleepy operations, such as memory allocation. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use the scheduler preemption notifiers to make kvm preemptibleAvi Kivity2007-10-131-2/+0
| | | | | | | | | | | | | | Current kvm disables preemption while the new virtualization registers are in use. This of course is not very good for latency sensitive workloads (one use of virtualization is to offload user interface and other latency insensitive stuff to a container, so that it is easier to analyze the remaining workload). This patch re-enables preemption for kvm; preemption is now only disabled when switching the registers in and out, and during the switch to guest mode and back. Contains fixes from Shaohua Li <shaohua.li@intel.com>. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move gfn_to_page out of kmap/unmap pairsShaohua Li2007-10-131-1/+1
| | | | | | | gfn_to_page might sleep with swap support. Move it out of the kmap calls. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.hRusty Russell2007-10-131-1/+1
| | | | | | | | | | The kernel now has asm/cpu-features.h: use those macros instead of inventing our own. Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix rare oops on guest context switchAvi Kivity2007-09-141-2/+3
| | | | | | | | | | | A guest context switch to an uncached cr3 can require allocation of shadow pages, but we only recycle shadow pages in kvm_mmu_page_fault(). Move shadow page recycling to mmu_topup_memory_caches(), which is called from both the page fault handler and from guest cr3 reload. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: MMU: Fix cleaning up the shadow page allocation cacheAvi Kivity2007-07-201-1/+1
| | | | | | | __free_page() wants a struct page, not a virtual address. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: MMU: Fix oopses with SLUBAvi Kivity2007-07-201-13/+26
| | | | | | | | | The kvm mmu uses page->private on shadow page tables; so does slub, and an oops result. Fix by allocating regular pages for shadows instead of using slub. Tested-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix memory slot management functions for guest smpAvi Kivity2007-07-201-62/+41
| | | | | | | | | | | The memory slot management functions were oriented against vcpu 0, where they should be kvm-wide. This causes hangs starting X on guest smp. Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific. Unfortunately this reduces the efficiency of the mmu object cache a bit. We may have to revisit this later. Signed-off-by: Avi Kivity <avi@qumranet.com>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-201-4/+4
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* KVM: Clean up #includesAvi Kivity2007-07-161-4/+6
| | | | | | Remove unnecessary ones, and rearange the remaining in the standard order. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix Wrong tlb flush orderShaohua Li2007-07-161-1/+1
| | | | | | | Need to flush the tlb after updating a pte, not before. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Flush remote tlbs when reducing shadow pte permissionsAvi Kivity2007-07-161-3/+5
| | | | | | | | | | | When a vcpu causes a shadow tlb entry to have reduced permissions, it must also clear the tlb on remote vcpus. We do that by: - setting a bit on the vcpu that requests a tlb flush before the next entry - if the vcpu is currently executing, we send an ipi to make sure it exits before we continue Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix vcpu freeing for guest smpAvi Kivity2007-07-161-2/+2
| | | | | | | | A vcpu can pin up to four mmu shadow pages, which means the freeing loop will never terminate. Fix by first unpinning shadow pages on all vcpus, then freeing shadow pages. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Lazy guest cr3 switchingAvi Kivity2007-07-161-21/+22
| | | | | | | | | Switch guest paging context may require us to allocate memory, which might fail. Instead of wiring up error paths everywhere, make context switching lazy and actually do the switch before the next guest entry, where we can return an error if allocation fails. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove unused large page markerAvi Kivity2007-07-161-1/+0
| | | | | | | This has not been used for some time, as the same information is available in the page header. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Don't cache guest access bits in the shadow page tableAvi Kivity2007-07-161-8/+0
| | | | | | | | | This was once used to avoid accessing the guest pte when upgrading the shadow pte from read-only to read-write. But usually we need to set the guest pte dirty or accessed bits anyway, so this wasn't really exploited. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Simpify accessed/dirty/present/nx bit handlingAvi Kivity2007-07-161-5/+0
| | | | | | | | Always set the accessed and dirty bit (since having them cleared causes a read-modify-write cycle), always set the present bit, and copy the nx bit from the guest. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Make setting shadow ptes atomic on i386Avi Kivity2007-07-161-2/+12
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fold fix_write_pf() into set_pte_common()Avi Kivity2007-07-161-0/+11
| | | | | | | This prevents some work from being performed twice, and, more importantly, reduces the number of places where we modify shadow ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fold fix_read_pf() into set_pte_common()Avi Kivity2007-07-161-17/+0
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move set_pte_common() to pte width dependent codeAvi Kivity2007-07-161-48/+0
| | | | | | In preparation of some modifications. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Use slab caches for shadow pages and their headersAvi Kivity2007-07-161-25/+39
| | | | | | Use slab caches instead of a simple custom list. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Store shadow page tables as kernel virtual addresses, not physicalAvi Kivity2007-07-161-17/+15
| | | | | | Simpifies things a bit. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Simplify kvm_mmu_free_page() a tiny bitAvi Kivity2007-07-161-6/+4
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Update shadow pte on write to guest pteAvi Kivity2007-07-161-0/+15
| | | | | | | | | | | | | | | | | | | | A typical demand page/copy on write pattern is: - page fault on vaddr - kvm propagates fault to guest - guest handles fault, updates pte - kvm traps write, clears shadow pte, resumes guest - guest returns to userspace, re-faults on same vaddr - kvm installs shadow pte, resumes guest - guest continues So, three vmexits for a single guest page fault. But if instead of clearing the page table entry, we update to correspond to the value that the guest has just written, we eliminate the third vmexit. This patch does exactly that, reducing kbuild time by about 10%. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Respect nonpae pagetable quadrant when zapping ptesAvi Kivity2007-07-161-0/+4
| | | | | | | | | | | | | | | | | | | When a guest writes to a page that has an mmu shadow, we have to clear the shadow pte corresponding to the memory location touched by the guest. Now, in nonpae mode, a single guest page may have two or four shadow pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps 2MB or 1GB), so we when we look up the page we find up to three additional aliases for the page. Since we _clear_ the shadow pte, it doesn't matter except for a slight performance penalty, but if we want to _update_ the shadow pte instead of clearing it, it is vital that we don't modify the aliases. Fortunately, exactly which page is needed (the "quadrant") is easily computed, and is accessible in the shadow page header. All we need is to ignore shadow pages from the wrong quadrants. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()Avi Kivity2007-07-161-7/+4
| | | | | | | Instead of calling two functions and repeating expensive checks, call one function and provide it with before/after information. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Assume that writes smaller than 4 bytes are to non-pagetable pagesAvi Kivity2007-07-161-0/+1
| | | | | | | | This allows us to remove write protection earlier than otherwise. Should some mad OS choose to use byte writes to update pagetables, it will suffer a performance hit, but still work correctly. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: fix an if() conditionAdrian Bunk2007-05-031-1/+1
| | | | | | | | It might have worked in this case since PT_PRESENT_MASK is 1, but let's express this correctly. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Per-vcpu statisticsAvi Kivity2007-05-031-1/+1
| | | | | | | | Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Avoid heavy ASSERT at non debug mode.Yaozu Dong2007-05-031-0/+6
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Retry sleeping allocation if atomic allocation failsAvi Kivity2007-05-031-5/+21
| | | | | | This avoids -ENOMEM under memory pressure. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use slab caches to allocate mmu data structuresAvi Kivity2007-05-031-4/+35
| | | | | | | Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle partial pae pdptrAvi Kivity2007-05-031-6/+12
| | | | | | | | | | | Some guests (Solaris) do not set up all four pdptrs, but leave some invalid. kvm incorrectly treated these as valid page directories, pinning the wrong pages and causing general confusion. Fix by checking the valid bit of a pae pdpte. This closes sourceforge bug 1698922. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Simply gfn_to_page()Avi Kivity2007-05-031-8/+4
| | | | | | | | | | | | Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add mmu cache clear functionDor Laor2007-05-031-0/+17
| | | | | | | | | | Functions that play around with the physical memory map need a way to clear mappings to possibly nonexistent or invalid memory. Both the mmu cache and the processor tlb are cleared. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use list_move()Avi Kivity2007-05-031-8/+4
| | | | | | Use list_move() where possible. Noticed by Dor Laor. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix hugepage pdes mapping same physical address with different accessAvi Kivity2007-05-031-3/+5
| | | | | | | | | | | | | | | | | | | The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove global pte trackingAvi Kivity2007-05-031-9/+0
| | | | | | | | | | | The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: Avi Kivity <avi@qumranet.com>
OpenPOWER on IntegriCloud