summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm/book3s_64_mmu_hv.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'powerpc-4.1-2' of ↵Linus Torvalds2015-04-261-8/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: - fix for mm_dec_nr_pmds() from Scott. - fixes for oopses seen with KVM + THP from Aneesh. - build fixes from Aneesh & Shreyas. * tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error powerpc/mm/thp: Return pte address if we find trans_splitting. powerpc/mm/thp: Make page table walk safe against thp split/collapse KVM: PPC: Remove page table walk helpers KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
| * powerpc/mm/thp: Return pte address if we find trans_splitting.Aneesh Kumar K.V2015-04-171-6/+3
| | | | | | | | | | | | | | | | | | | | For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/mm/thp: Make page table walk safe against thp split/collapseAneesh Kumar K.V2015-04-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | KVM: PPC: Book3S HV: Create debugfs file for each guest's HPTPaul Mackerras2015-04-211-0/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S HV: Add helpers for lock/unlock hpteAneesh Kumar K.V2015-04-211-15/+10
| | | | | | | | | | | | | | | | | | | | This adds helper routines for locking and unlocking HPTEs, and uses them in the rest of the code. We don't change any locking rules in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S HV: Remove RMA-related variables from codeAneesh Kumar K.V2015-04-211-14/+14
|/ | | | | | | | | | We don't support real-mode areas now that 970 support is removed. Remove the remaining details of rma from the code. Also rename rma_setup_done to hpte_setup_done to better reflect the changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Remove code for PPC970 processorsPaul Mackerras2014-12-171-186/+14
| | | | | | | | | | | | | | | | | This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactionsSuresh E. Warrier2014-12-171-3/+10
| | | | | | | | | | | This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: ptes are big endianCédric Le Goater2014-12-151-1/+7
| | | | | | | | | | | | | When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Add missing HPTE unlockAneesh Kumar K.V2014-12-151-1/+4
| | | | | | | | | | | | In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-10-081-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits) kvm: do not handle APIC access page if in-kernel irqchip is not in use KVM: s390: count vcpu wakeups in stat.halt_wakeup KVM: s390/facilities: allow TOD-CLOCK steering facility bit KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode arm/arm64: KVM: Report correct FSC for unsupported fault types arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc kvm: Fix kvm_get_page_retry_io __gup retval check arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset kvm: x86: Unpin and remove kvm_arch->apic_access_page kvm: vmx: Implement set_apic_access_page_addr kvm: x86: Add request bit to reload APIC access page address kvm: Add arch specific mmu notifier for page invalidation kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static kvm: Fix page ageing bugs kvm/x86/mmu: Pass gfn and level to rmapp callback. x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only kvm: x86: use macros to compute bank MSRs KVM: x86: Remove debug assertion of non-PAE reserved bits kvm: don't take vcpu mutex for obviously invalid vcpu ioctls kvm: Faults which trigger IO release the mmap_sem ...
| * kvm: Fix page ageing bugsAndres Lagar-Cavilla2014-09-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | powerpc/kvm/cma: Fix panic introduces by signed shift operationLaurent Dufour2014-09-031-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | fc95ca7284bc54953165cba76c3228bd2cdb9591 introduces a memset in kvmppc_alloc_hpt since the general CMA doesn't clear the memory it allocates. However, the size argument passed to memset is computed from a signed value and its signed bit is extended by the cast the compiler is doing. This lead to extremely large size value when dealing with order value >= 31, and almost all the memory following the allocated space is cleaned. As a consequence, the system is panicing and may even fail spawning the kdump kernel. This fix makes use of an unsigned value for the memset's size argument to avoid sign extension. Among this fix, another shift operation which may lead to signed extended value too is also fixed. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Graf <agraf@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-08-071-73/+72
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second round of KVM changes from Paolo Bonzini: "Here are the PPC and ARM changes for KVM, which I separated because they had small conflicts (respectively within KVM documentation, and with 3.16-rc changes). Since they were all within the subsystem, I took care of them. Stephen Rothwell reported some snags in PPC builds, but they are all fixed now; the latest linux-next report was clean. New features for ARM include: - KVM VGIC v2 emulation on GICv3 hardware - Big-Endian support for arm/arm64 (guest and host) - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list) And for PPC: - Book3S: Good number of LE host fixes, enable HV on LE - Book3S HV: Add in-guest debug support This release drops support for KVM on the PPC440. As a result, the PPC merge removes more lines than it adds. :) I also included an x86 change, since Davidlohr tied it to an independent bug report and the reporter quickly provided a Tested-by; there was no reason to wait for -rc2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits) KVM: Move more code under CONFIG_HAVE_KVM_IRQFD KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use KVM: nVMX: Fix nested vmexit ack intr before load vmcs01 KVM: PPC: Enable IRQFD support for the XICS interrupt controller KVM: Give IRQFD its own separate enabling Kconfig option KVM: Move irq notifier implementation into eventfd.c KVM: Move all accesses to kvm::irq_routing into irqchip.c KVM: irqchip: Provide and use accessors for irq routing table KVM: Don't keep reference to irq routing table in irqfd struct KVM: PPC: drop duplicate tracepoint arm64: KVM: fix 64bit CP15 VM access for 32bit guests KVM: arm64: GICv3: mandate page-aligned GICV region arm64: KVM: GICv3: move system register access to msr_s/mrs_s KVM: PPC: PR: Handle FSCR feature deselects KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr KVM: PPC: Remove DCR handling KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Handle magic page in kvmppc_ld/st ...
| * Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvmPaolo Bonzini2014-08-051-73/+72
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert "add load inst fixup" KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Conflicts: Documentation/virtual/kvm/api.txt
| | * KVM: PPC: Allow kvmppc_get_last_inst() to failMihai Caraman2014-07-281-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| | * KVM: PPC: Book3S HV: Make HTAB code LE host awareAlexander Graf2014-07-281-61/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: Alexander Graf <agraf@suse.de>
* | | PPC, KVM, CMA: use general CMA reserved area management frameworkJoonsoo Kim2014-08-061-3/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, we have general CMA reserved area management framework, so use it for future maintainabilty. There is no functional change. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Gleb Natapov <gleb@kernel.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb valueAneesh Kumar K.V2014-06-251-1/+1
|/ | | | | | | | | | | With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Make sure we don't miss dirty pagesPaul Mackerras2014-05-301-10/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Current, when testing whether a page is dirty (when constructing the bitmap for the KVM_GET_DIRTY_LOG ioctl), we test the C (changed) bit in the HPT entries mapping the page, and if it is 0, we consider the page to be clean. However, the Power ISA doesn't require processors to set the C bit to 1 immediately when writing to a page, and in fact allows them to delay the writeback of the C bit until they receive a TLB invalidation for the page. Thus it is possible that the page could be dirty and we miss it. Now, if there are vcpus running, this is not serious since the collection of the dirty log is racy already - some vcpu could dirty the page just after we check it. But if there are no vcpus running we should return definitive results, in case we are in the final phase of migrating the guest. Also, if the permission bits in the HPTE don't allow writing, then we know that no CPU can set C. If the HPTE was previously writable and the page was modified, any C bit writeback would have been flushed out by the tlbie that we did when changing the HPTE to read-only. Otherwise we need to do a TLB invalidation even if the C bit is 0, and then check the C bit. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Fix dirty map for hugepagesAlexey Kardashevskiy2014-05-301-9/+24
| | | | | | | | | | | | | | | | | | The dirty map that we construct for the KVM_GET_DIRTY_LOG ioctl has one bit per system page (4K/64K). Currently, we only set one bit in the map for each HPT entry with the Change bit set, even if the HPT is for a large page (e.g., 16MB). Userspace then considers only the first system page dirty, though in fact the guest may have modified anywhere in the large page. To fix this, we make kvm_test_clear_dirty() return the actual number of pages that are dirty (and rename it to kvm_test_clear_dirty_npages() to emphasize that that's what it returns). In kvmppc_hv_get_dirty_log() we then set that many bits in the dirty map. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base addressPaul Mackerras2014-05-301-2/+13
| | | | | | | | | | | | | | | | Currently, when a huge page is faulted in for a guest, we select the rmap chain to insert the HPTE into based on the guest physical address that the guest tried to access. Since there is an rmap chain for each system page, there are many rmap chains for the area covered by a huge page (e.g. 256 for 16MB pages when PAGE_SIZE = 64kB), and the huge-page HPTE could end up in any one of them. For consistency, and to make the huge-page HPTEs easier to find, we now put huge-page HPTEs in the rmap chain corresponding to the base address of the huge page. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: BOOK3S: HV: Prefer CMA region for hash page table allocationAneesh Kumar K.V2014-05-301-17/+6
| | | | | | | | | | | | | | | | Today when KVM tries to reserve memory for the hash page table it allocates from the normal page allocator first. If that fails it falls back to CMA's reserved region. One of the side effects of this is that we could end up exhausting the page allocator and get linux into OOM conditions while we still have plenty of space available in CMA. This patch addresses this issue by first trying hash page table allocation from CMA's reserved region before falling back to the normal page allocator. So if we run out of memory, we really are out of memory. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Add transactional memory supportMichael Neuling2014-03-291-1/+8
| | | | | | | | | | | | | | | This adds saving of the transactional memory (TM) checkpointed state on guest entry and exit. We only do this if we see that the guest has an active transaction. It also adds emulation of the TM state changes when delivering IRQs into the guest. According to the architecture, if we are transactional when an IRQ occurs, the TM state is changed to suspended, otherwise it's left unchanged. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
* Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queuePaolo Bonzini2014-01-291-2/+2
|\ | | | | | | | | | | Conflicts: arch/powerpc/kvm/book3s_hv_rmhandlers.S arch/powerpc/kvm/booke.c
| * KVM: PPC: Book3S HV: Basic little-endian guest supportAnton Blanchard2014-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We create a guest MSR from scratch when delivering exceptions in a few places. Instead of extracting LPCR[ILE] and inserting it into MSR_LE each time, we simply create a new variable intr_msr which contains the entire MSR to use. For a little-endian guest, userspace needs to set the ILE (interrupt little-endian) bit in the LPCR for each vcpu (or at least one vcpu in each virtual core). [paulus@samba.org - removed H_SET_MODE implementation from original version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Book3S: MMIO emulation support for little endian guestsCédric Le Goater2014-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MMIO emulation reads the last instruction executed by the guest and then emulates. If the guest is running in Little Endian order, or more generally in a different endian order of the host, the instruction needs to be byte-swapped before being emulated. This patch adds a helper routine which tests the endian order of the host and the guest in order to decide whether a byteswap is needed or not. It is then used to byteswap the last instruction of the guest in the endian order of the host before MMIO emulation is performed. Finally, kvmppc_handle_load() of kvmppc_handle_store() are modified to reverse the endianness of the MMIO if required. Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [agraf: add booke handling] Signed-off-by: Alexander Graf <agraf@suse.de>
* | powerpc: kvm: fix rare but potential deadlock scenepingfan liu2013-11-181-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since kvmppc_hv_find_lock_hpte() is called from both virtmode and realmode, so it can trigger the deadlock. Suppose the following scene: Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus. If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in realmode for a long time. What makes things even worse if the following happens, On cpuM, bitlockX is hold, on cpuN, Y is hold. vcpu_B_2 try to lock Y on cpuM in realmode vcpu_A_2 try to lock X on cpuN in realmode Oops! deadlock happens Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@samba.org> CC: stable@vger.kernel.org Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S HV: Fix physical address calculationsPaul Mackerras2013-11-181-3/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug in kvmppc_do_h_enter() where the physical address for a page can be calculated incorrectly if transparent huge pages (THP) are active. Until THP came along, it was true that if we encountered a large (16M) page in kvmppc_do_h_enter(), then the associated memslot must be 16M aligned for both its guest physical address and the userspace address, and the physical address calculations in kvmppc_do_h_enter() assumed that. With THP, that is no longer true. In the case where we are using MMU notifiers and the page size that we get from the Linux page tables is larger than the page being mapped by the guest, we need to fill in some low-order bits of the physical address. Without THP, these bits would be the same in the guest physical address (gpa) and the host virtual address (hva). With THP, they can be different, and we need to use the bits from hva rather than gpa. In the case where we are not using MMU notifiers, the host physical address we get from the memslot->arch.slot_phys[] array already includes the low-order bits down to the PAGE_SIZE level, even if we are using large pages. Thus we can simplify the calculation in this case to just add in the remaining bits in the case where PAGE_SIZE is 64k and the guest is mapping a 4k page. The same bug exists in kvmppc_book3s_hv_page_fault(). The basic fix is to use psize (the page size from the HPTE) rather than pte_size (the page size from the Linux PTE) when updating the HPTE low word in r. That means that pfn needs to be computed to PAGE_SIZE granularity even if the Linux PTE is a huge page PTE. That can be arranged simply by doing the page_to_pfn() before setting page to the head of the compound page. If psize is less than PAGE_SIZE, then we need to make sure we only update the bits from PAGE_SIZE upwards, in order not to lose any sub-page offset bits in r. On the other hand, if psize is greater than PAGE_SIZE, we need to make sure we don't bring in non-zero low order bits in pfn, hence we mask (pfn << PAGE_SHIFT) with ~(psize - 1). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* kvm: powerpc: Add kvmppc_ops callbackAneesh Kumar K.V2013-10-171-10/+7
| | | | | | | | | | This patch add a new callback kvmppc_ops. This will help us in enabling both HV and PR KVM together in the same kernel. The actual change to enable them together is done in the later patch in the series. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in booke changes] Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S PR: Better handling of host-side read-only pagesPaul Mackerras2013-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we request write access to all pages that get mapped into the guest, even if the guest is only loading from the page. This reduces the effectiveness of KSM because it means that we unshare every page we access. Also, we always set the changed (C) bit in the guest HPTE if it allows writing, even for a guest load. This fixes both these problems. We pass an 'iswrite' flag to the mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether the access is a load or a store. The mmu.xlate() functions now only set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot() instead of gfn_to_pfn() so that it can indicate whether we need write access to the page, and get back a 'writable' flag to indicate whether the page is writable or not. If that 'writable' flag is clear, we then make the host HPTE read-only even if the guest HPTE allowed writing. This means that we can get a protection fault when the guest writes to a page that it has mapped read-write but which is read-only on the host side (perhaps due to KSM having merged the page). Thus we now call kvmppc_handle_pagefault() for protection faults as well as HPTE not found faults. In kvmppc_handle_pagefault(), if the access was allowed by the guest HPTE and we thus need to install a new host HPTE, we then need to remove the old host HPTE if there is one. This is done with a new function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to find and remove the old host HPTE. Since the memslot-related functions require the KVM SRCU read lock to be held, this adds srcu_read_lock/unlock pairs around the calls to kvmppc_handle_pagefault(). Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore guest HPTEs that don't permit access, and to return -EPERM for accesses that are not permitted by the page protections. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Store LPCR value for each virtual corePaul Mackerras2013-10-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the ability to have a separate LPCR (Logical Partitioning Control Register) value relating to a guest for each virtual core, rather than only having a single value for the whole VM. This corresponds to what real POWER hardware does, where there is a LPCR per CPU thread but most of the fields are required to have the same value on all active threads in a core. The per-virtual-core LPCR can be read and written using the GET/SET_ONE_REG interface. Userspace can can only modify the following fields of the LPCR value: DPFD Default prefetch depth ILE Interrupt little-endian TC Translation control (secondary HPT hash group search disable) We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which contains bits relating to memory management, i.e. the Virtualized Partition Memory (VPM) bits and the bits relating to guest real mode. When this default value is updated, the update needs to be propagated to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do that. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>
* Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into queueGleb Natapov2013-08-301-20/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-ppc-next' of git://github.com/agraf/linux-2.6: KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation powerpc/kvm: Copy the pvr value after memset KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry kvm/ppc/booke: Don't call kvm_guest_enter twice kvm/ppc: Call trace_hardirqs_on before entry KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers KVM: PPC: Book3S HV: Correct tlbie usage powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation. powerpc/kvm: Contiguous memory allocator based RMA allocation powerpc/kvm: Contiguous memory allocator based hash page table allocation KVM: PPC: Book3S: Ignore DABR register mm/cma: Move dma contiguous changes into a seperate config
| * Merge remote-tracking branch 'origin/next' into kvm-ppc-nextAlexander Graf2013-08-291-3/+5
| |\ | | | | | | | | | | | | | | | | | | Conflicts: mm/Kconfig CMA DMA split and ZSWAP introduction were conflicting, fix up manually.
| * | powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.Aneesh Kumar K.V2013-07-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both RMA and hash page table request will be a multiple of 256K. We can use a chunk size of 256K to track the free/used 256K chunk in the bitmap. This should help to reduce the bitmap size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | powerpc/kvm: Contiguous memory allocator based hash page table allocationAneesh Kumar K.V2013-07-081-20/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest hash page table. This implies that we need to reserve a big memory region to ensure we can create large number of guest simultaneously with KVM on Power. Another disadvantage is that the reserved memory is not available to rest of the subsystems and and that implies we limit the total available memory in the host. This patch series switch the guest hash page table allocation to use contiguous memory allocator. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | | ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud2013-08-261-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | powerpc/kvm: Handle transparent hugepage in KVMAneesh Kumar K.V2013-06-211-3/+5
|/ | | | | | | | We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-05-051-18/+102
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
| * KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty mapPaul Mackerras2013-04-261-9/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications done by the host to the virtual processor areas (VPAs) and dispatch trace logs (DTLs) registered by the guest. This is because those modifications are done either in real mode or in the host kernel context, and in neither case does the access go through the guest's HPT, and thus no change (C) bit gets set in the guest's HPT. However, the changes done by the host do need to be tracked so that the modified pages get transferred when doing live migration. In order to track these modifications, this adds a dirty flag to the struct representing the VPA/DTL areas, and arranges to set the flag when the VPA/DTL gets modified by the host. Then, when we are collecting the dirty log, we also check the dirty flags for the VPA and DTL for each vcpu and set the relevant bit in the dirty log if necessary. Doing this also means we now need to keep track of the guest physical address of the VPA/DTL areas. So as not to lose track of modifications to a VPA/DTL area when it gets unregistered, or when a new area gets registered in its place, we need to transfer the dirty state to the rmap chain. This adds code to kvmppc_unpin_guest_page() to do that if the area was dirty. To simplify that code, we now require that all VPA, DTL and SLB shadow buffer areas fit within a single host page. Guests already comply with this requirement because pHyp requires that these areas not cross a 4k boundary. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changesPaul Mackerras2013-04-261-9/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, the code that determines whether a HPT entry has changed, and thus needs to be sent to userspace when it is copying the HPT, doesn't consider a hardware update to the reference and change bits (R and C) in the HPT entries to constitute a change that needs to be sent to userspace. This adds code to check for changes in R and C when we are scanning the HPT to find changed entries, and adds code to set the changed flag for the HPTE when we update the R and C bits in the guest view of the HPTE. Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification() function into kvm_book3s_64.h. Current Linux guest kernels don't use the hardware updates of R and C in the HPT, so this change won't affect them. Linux (or other) kernels might in future want to use the R and C bits and have them correctly transferred across when a guest is migrated, so it is better to correct this deficiency. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | constify a bunch of struct file_operations instancesAl Viro2013-04-091-1/+1
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidationsPaul Mackerras2012-12-061-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we change or remove a HPT (hashed page table) entry, we can do either a global TLB invalidation (tlbie) that works across the whole machine, or a local invalidation (tlbiel) that only affects this core. Currently we do local invalidations if the VM has only one vcpu or if the guest requests it with the H_LOCAL flag, though the guest Linux kernel currently doesn't ever use H_LOCAL. Then, to cope with the possibility that vcpus moving around to different physical cores might expose stale TLB entries, there is some code in kvmppc_hv_entry to flush the whole TLB of entries for this VM if either this vcpu is now running on a different physical core from where it last ran, or if this physical core last ran a different vcpu. There are a number of problems on POWER7 with this as it stands: - The TLB invalidation is done per thread, whereas it only needs to be done per core, since the TLB is shared between the threads. - With the possibility of the host paging out guest pages, the use of H_LOCAL by an SMP guest is dangerous since the guest could possibly retain and use a stale TLB entry pointing to a page that had been removed from the guest. - The TLB invalidations that we do when a vcpu moves from one physical core to another are unnecessary in the case of an SMP guest that isn't using H_LOCAL. - The optimization of using local invalidations rather than global should apply to guests with one virtual core, not just one vcpu. (None of this applies on PPC970, since there we always have to invalidate the whole TLB when entering and leaving the guest, and we can't support paging out guest memory.) To fix these problems and simplify the code, we now maintain a simple cpumask of which cpus need to flush the TLB on entry to the guest. (This is indexed by cpu, though we only ever use the bits for thread 0 of each core.) Whenever we do a local TLB invalidation, we set the bits for every cpu except the bit for thread 0 of the core that we're currently running on. Whenever we enter a guest, we test and clear the bit for our core, and flush the TLB if it was set. On initial startup of the VM, and when resetting the HPT, we set all the bits in the need_tlb_flush cpumask, since any core could potentially have stale TLB entries from the previous VM to use the same LPID, or the previous contents of the HPT. Then, we maintain a count of the number of online virtual cores, and use that when deciding whether to use a local invalidation rather than the number of online vcpus. The code to make that decision is extracted out into a new function, global_invalidates(). For multi-core guests on POWER7 (i.e. when we are using mmu notifiers), we now never do local invalidations regardless of the H_LOCAL flag. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPTPaul Mackerras2012-12-061-1/+1
| | | | | | | | | | | | This fixes a bug in the code which allows userspace to read out the contents of the guest's hashed page table (HPT). On the second and subsequent passes through the HPT, when we are reporting only those entries that have changed, we were incorrectly initializing the index field of the header with the index of the first entry we skipped rather than the first changed entry. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPTPaul Mackerras2012-12-061-0/+24
| | | | | | | | | | | | | | | | | | | | | With HV-style KVM, we maintain reverse-mapping lists that enable us to find all the HPT (hashed page table) entries that reference each guest physical page, with the heads of the lists in the memslot->arch.rmap arrays. When we reset the HPT (i.e. when we reboot the VM), we clear out all the HPT entries but we were not clearing out the reverse mapping lists. The result is that as we create new HPT entries, the lists get corrupted, which can easily lead to loops, resulting in the host kernel hanging when it tries to traverse those lists. This fixes the problem by zeroing out all the reverse mapping lists when we zero out the HPT. This incidentally means that we are also zeroing our record of the referenced and changed bits (not the bits in the Linux PTEs, used by the Linux MM subsystem, but the bits used by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and kvm_test_age_hva()). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPTPaul Mackerras2012-12-061-0/+344
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on this fd return the contents of the HPT (hashed page table), writes create and/or remove entries in the HPT. There is a new capability, KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl takes an argument structure with the index of the first HPT entry to read out and a set of flags. The flags indicate whether the user is intending to read or write the HPT, and whether to return all entries or only the "bolted" entries (those with the bolted bit, 0x10, set in the first doubleword). This is intended for use in implementing qemu's savevm/loadvm and for live migration. Therefore, on reads, the first pass returns information about all HPTEs (or all bolted HPTEs). When the first pass reaches the end of the HPT, it returns from the read. Subsequent reads only return information about HPTEs that have changed since they were last read. A read that finds no changed HPTEs in the HPT following where the last read finished will return 0 bytes. The format of the data provides a simple run-length compression of the invalid entries. Each block of data starts with a header that indicates the index (position in the HPT, which is just an array), the number of valid entries starting at that index (may be zero), and the number of invalid entries following those valid entries. The valid entries, 16 bytes each, follow the header. The invalid entries are not explicitly represented. Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix documentation] Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Restructure HPT entry creation codePaul Mackerras2012-12-061-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This restructures the code that creates HPT (hashed page table) entries so that it can be called in situations where we don't have a struct vcpu pointer, only a struct kvm pointer. It also fixes a bug where kvmppc_map_vrma() would corrupt the guest R4 value. Most of the work of kvmppc_virtmode_h_enter is now done by a new function, kvmppc_virtmode_do_h_enter, which itself calls another new function, kvmppc_do_h_enter, which contains most of the old kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments for the place to return the HPTE index, the Linux page tables to use, and whether it is being called in real mode, thus removing the need for it to have the vcpu as an argument. Currently kvmppc_map_vrma creates the VRMA (virtual real mode area) HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily to handle H_ENTER hcalls from the guest that need to pin a page of memory. Since H_ENTER returns the index of the created HPTE in R4, kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4 in the case when it gets called from kvmppc_map_vrma on the first VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls kvmppc_virtmode_do_h_enter with the address of a dummy word as the place to store the HPTE index, thus avoiding corrupting the guest R4. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: Take kvm instead of vcpu to mmu_notifier_retryChristoffer Dall2012-10-231-1/+1
| | | | | | | | | | | The mmu_notifier_retry is not specific to any vcpu (and never will be) so only take struct kvm as a parameter. The motivation is the ARM mmu code that needs to call this from somewhere where we long let go of the vcpu pointer. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulationPaul Mackerras2012-10-051-5/+4
| | | | | | | | | | | | | | | | | | In the case where the host kernel is using a 64kB base page size and the guest uses a 4k HPTE (hashed page table entry) to map an emulated MMIO device, we were calculating the guest physical address wrongly. We were calculating a gfn as the guest physical address shifted right 16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the effective address, since the HPTE had a 4k page size. Thus the gpa reported to userspace was missing 4 bits. Instead, we now compute the guest physical address from the HPTE without reference to the host page size, and then compute the gfn by shifting the gpa right PAGE_SHIFT bits. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctlyPaul Mackerras2012-10-051-5/+28
| | | | | | | | | | | | | | | | | | | | | | | This adds an implementation of kvm_arch_flush_shadow_memslot for Book3S HV, and arranges for kvmppc_core_commit_memory_region to flush the dirty log when modifying an existing slot. With this, we can handle deletion and modification of memory slots. kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which on Book3S HV now traverses the reverse map chains to remove any HPT (hashed page table) entries referring to pages in the memslot. This gets called by generic code whenever deleting a memslot or changing the guest physical address for a memslot. We flush the dirty log in kvmppc_core_commit_memory_region for consistency with what x86 does. We only need to flush when an existing memslot is being modified, because for a new memslot the rmap array (which stores the dirty bits) is all zero, meaning that every page is considered clean already, and when deleting a memslot we obviously don't care about the dirty bits any more. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
OpenPOWER on IntegriCloud