summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: PPC: E500: Split host and guest MMU partsAlexander Graf2013-01-244-624/+704
| | | | | | | | | | This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling) and e500_mmu_host.c (host TLB handling). The main benefit of this split is readability and maintainability. It's just a lot harder to write dirty code :). Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: e500: Call kvmppc_mmu_map for initial mappingAlexander Graf2013-01-241-31/+7
| | | | | | | | | | | When emulating tlbwe, we want to automatically map the entry that just got written in our shadow TLB map, because chances are quite high that it's going to be used very soon. Today this happens explicitly, duplicating all the logic that is in kvmppc_mmu_map() already. Just call that one instead. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: E500: Propagate errors when shadow mappingAlexander Graf2013-01-241-28/+41
| | | | | | | | | | | | | When shadow mapping a page, mapping this page can fail. In that case we don't have a shadow map. Take this case into account, otherwise we might end up writing bogus TLB entries into the host TLB. While at it, also move the write_stlbe() calls into the respective TLBn handlers. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: E500: Explicitly mark shadow maps invalidAlexander Graf2013-01-241-3/+10
| | | | | | | | | | When we invalidate shadow TLB maps on the host, we don't mark them as not valid. But we should. Fix this by removing the E500_TLB_VALID from their flags when invalidating. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: E500: Move write_stlbe higherAlexander Graf2013-01-241-16/+16
| | | | | | | | | Later patches want to call the function and it doesn't have dependencies on anything below write_host_tlbe. Move it higher up in the file. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: VMX: set vmx->emulation_required only when needed.Gleb Natapov2013-01-241-7/+12
| | | | | | | | | | | | | | If emulate_invalid_guest_state=false vmx->emulation_required is never actually used, but it ends up to be always set to true since handle_invalid_guest_state(), the only place it is reset back to false, is never called. This, besides been not very clean, makes vmexit and vmentry path to check emulate_invalid_guest_state needlessly. The patch fixes that by keeping emulation_required coherent with emulate_invalid_guest_state setting. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: fix use of uninitialized memory as segment descriptor in emulator.Gleb Natapov2013-01-241-1/+3
| | | | | | | | | If VMX reports segment as unusable, zero descriptor passed by the emulator before returning. Such descriptor will be considered not present by the emulator. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: rename fix_pmode_dataseg to fix_pmode_seg.Gleb Natapov2013-01-241-7/+7
| | | | | | | The function deals with code segment too. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: don't clobber segment AR of unusable segments.Gleb Natapov2013-01-241-2/+0
| | | | | | | | | Usability is returned in unusable field, so not need to clobber entire AR. Callers have to know how to deal with unusable segments already since if emulate_invalid_guest_state=true AR is not zeroed. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: skip vmx->rmode.vm86_active check on cr0 write if unrestricted ↵Gleb Natapov2013-01-241-8/+6
| | | | | | | | | | | guest is enabled vmx->rmode.vm86_active is never true is unrestricted guest is enabled. Make it more explicit that neither enter_pmode() nor enter_rmode() is called in this case. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: remove hack that disables emulation on vcpu reset/initGleb Natapov2013-01-241-3/+0
| | | | | | | | There is no reason for it. If state is suitable for vmentry it will be detected during guest entry and no emulation will happen. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: if unrestricted guest is enabled vcpu state is always valid.Gleb Natapov2013-01-241-0/+3
| | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: reset CPL only on CS register write.Gleb Natapov2013-01-241-1/+2
| | | | | Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: remove special CPL cache access during transition to real mode.Gleb Natapov2013-01-241-8/+4
| | | | | | | | | | Since vmx_get_cpl() always returns 0 when VCPU is in real mode it is no longer needed. Also reset CPL cache to zero during transaction to protected mode since transaction may happen while CS.selectors & 3 != 0, but in reality CPL is 0. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: convert a few freestanding emulations to fastopAvi Kivity2013-01-231-3/+3
| | | | | | Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: rearrange fastop definitionsAvi Kivity2013-01-231-35/+35
| | | | | | | | Make fastop opcodes usable in other emulations. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: convert 2-operand IMUL to fastopAvi Kivity2013-01-231-8/+6
| | | | | | Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: convert BT/BTS/BTR/BTC/BSF/BSR to fastopAvi Kivity2013-01-231-50/+26
| | | | | | Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: convert INC/DEC to fastopAvi Kivity2013-01-231-17/+7
| | | | | | Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: covert SETCC to fastopAvi Kivity2013-01-231-31/+29
| | | | | | | | | | This is a bit of a special case since we don't have the usual byte/word/long/quad switch; instead we switch on the condition code embedded in the instruction. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: convert shift/rotate instructions to fastopAvi Kivity2013-01-231-41/+31
| | | | | | | | SHL, SHR, ROL, ROR, RCL, RCR, SAR, SAL Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86 emulator: Convert SHLD, SHRD to fastopAvi Kivity2013-01-231-12/+21
| | | | | | Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi.kivity@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: improve reexecute_instructionXiao Guangrong2013-01-213-11/+45
| | | | | | | | | | | | | | The current reexecute_instruction can not well detect the failed instruction emulation. It allows guest to retry all the instructions except it accesses on error pfn For example, some cases are nested-write-protect - if the page we want to write is used as PDE but it chains to itself. Under this case, we should stop the emulation and report the case to userspace Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: let reexecute_instruction work for tdpXiao Guangrong2013-01-211-18/+43
| | | | | | | | | | | | Currently, reexecute_instruction refused to retry all instructions if tdp is enabled. If nested npt is used, the emulation may be caused by shadow page, it can be fixed by dropping the shadow page. And the only condition that tdp can not retry the instruction is the access fault on error pfn Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: clean up reexecute_instructionXiao Guangrong2013-01-211-7/+6
| | | | | | | | | Little cleanup for reexecute_instruction, also use gpa_to_gfn in retry_instruction Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: set_memory_region: Remove unnecessary variable memslotTakuya Yoshikawa2013-01-171-6/+5
| | | | | | | | | | One such variable, slot, is enough for holding a pointer temporarily. We also remove another local variable named slot, which is limited in a block, since it is confusing to have the same name in this function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Don't check for overlaps unless we create or move a slotTakuya Yoshikawa2013-01-171-8/+10
| | | | | | | | | Don't need the check for deleting an existing slot or just modifiying the flags. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Don't jump to out_free unnecessarilyTakuya Yoshikawa2013-01-171-4/+3
| | | | | | | | | This makes the separation between the sanity checks and the rest of the code a bit clearer. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: s390: kvm/sigp.c: fix memory leakageCong Ding2013-01-171-1/+3
| | | | | | | | the variable inti should be freed in the branch CPUSTAT_STOPPED. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() ↵Takuya Yoshikawa2013-01-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | takes a long time If the userspace starts dirty logging for a large slot, say 64GB of memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for a long time such as tens of milliseconds. This patch controls the lock hold time by asking the scheduler if we need to reschedule for others. One penalty for this is that we need to flush TLBs before releasing mmu_lock. But since holding mmu_lock for a long time does affect not only the guest, vCPU threads in other words, but also the host as a whole, we should pay for that. In practice, the cost will not be so high because we can protect a fair amount of memory before being rescheduled: on my test environment, cond_resched_lock() was called only once for protecting 12GB of memory even without THP. We can also revisit Avi's "unlocked TLB flush" work later for completely suppressing extra TLB flushes if needed. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itselfTakuya Yoshikawa2013-01-142-4/+4
| | | | | | | | | Better to place mmu_lock handling and TLB flushing code together since this is a self-contained function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itselfTakuya Yoshikawa2013-01-142-5/+8
| | | | | | | | | | | | | | No reason to make callers take mmu_lock since we do not need to protect kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access() together by mmu_lock in kvm_arch_commit_memory_region(): the former calls kvm_mmu_commit_zap_page() and flushes TLBs by itself. Note: we do not need to protect kvm->arch.n_requested_mmu_pages by mmu_lock as can be seen from the fact that it is read locklessly. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Remove unused slot_bitmap from kvm_mmu_pageTakuya Yoshikawa2013-01-143-22/+0
| | | | | | | | Not needed any more. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap basedTakuya Yoshikawa2013-01-141-13/+15
| | | | | | | | | | | | | | | | | This makes it possible to release mmu_lock and reschedule conditionally in a later patch. Although this may increase the time needed to protect the whole slot when we start dirty logging, the kernel should not allow the userspace to trigger something that will hold a spinlock for such a long time as tens of milliseconds: actually there is no limit since it is roughly proportional to the number of guest pages. Another point to note is that this patch removes the only user of slot_bitmap which will cause some problems when we increase the number of slots further. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: MMU: Remove unused parameter level from __rmap_write_protect()Takuya Yoshikawa2013-01-141-3/+3
| | | | | | | | No longer need to care about the mapping level in this function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Write protect the updated slot only when dirty logging is enabledTakuya Yoshikawa2013-01-142-2/+7
| | | | | | | | | | | | | | Calling kvm_mmu_slot_remove_write_access() for a deleted slot does nothing but search for non-existent mmu pages which have mappings to that deleted memory; this is safe but a waste of time. Since we want to make the function rmap based in a later patch, in a manner which makes it unsafe to be called for a deleted slot, we makes the caller see if the slot is non-zero and being dirty logged. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* Merge branch 'kvm-ppc-next' of https://github.com/agraf/linux-2.6 into queueGleb Natapov2013-01-1412-11/+153
|\
| * KVM: PPC: BookE: Add EPR ONE_REG syncAlexander Graf2013-01-103-1/+27
| | | | | | | | | | | | | | | | | | | | We need to be able to read and write the contents of the EPR register from user space. This patch implements that logic through the ONE_REG API and declares its (never implemented) SREGS counterpart as deprecated. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: BookE: Implement EPR exitAlexander Graf2013-01-107-3/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The External Proxy Facility in FSL BookE chips allows the interrupt controller to automatically acknowledge an interrupt as soon as a core gets its pending external interrupt delivered. Today, user space implements the interrupt controller, so we need to check on it during such a cycle. This patch implements logic for user space to enable EPR exiting, disable EPR exiting and EPR exiting itself, so that user space can acknowledge an interrupt when an external interrupt has successfully been delivered into the guest vcpu. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: BookE: Emulate mfspr on EPRAlexander Graf2013-01-101-0/+3
| | | | | | | | | | | | | | | | The EPR register is potentially valid for PR KVM as well, so we need to emulate accesses to it. It's only defined for reading, so only handle the mfspr case. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: BookE: Allow irq deliveries to inject requestsAlexander Graf2013-01-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | When injecting an interrupt into guest context, we usually don't need to check for requests anymore. At least not until today. With the introduction of EPR, we will have to create a request when the guest has successfully accepted an external interrupt though. So we need to prepare the interrupt delivery to abort guest entry gracefully. Otherwise we'd delay the EPR request. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Fix mfspr/mtspr MMUCFG emulationMihai Caraman2013-01-102-5/+2
| | | | | | | | | | | | | | | | | | | | On mfspr/mtspr emulation path Book3E's MMUCFG SPR with value 1015 clashes with G4's MSSSR0 SPR. Move MSSSR0 emulation from generic part to Books3S. MSSSR0 also clashes with Book3S's DABRX SPR. DABRX was not explicitly handled so Book3S execution flow will behave as before. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Book3S: PR: Enable alternative instruction for SC 1Alexander Graf2013-01-103-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When running on top of pHyp, the hypercall instruction "sc 1" goes straight into pHyp without trapping in supervisor mode. So if we want to support PAPR guest in this configuration we need to add a second way of accessing PAPR hypercalls, preferably with the exact same semantics except for the instruction. So let's overlay an officially reserved instruction and emulate PAPR hypercalls whenever we hit that one. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Only WARN on invalid emulationAlexander Graf2013-01-101-1/+2
| | | | | | | | | | | | | | | | | | When we hit an emulation result that we didn't expect, that is an error, but it's nothing that warrants a BUG(), because it can be guest triggered. So instead, let's only WARN() the user that this happened. Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: PPC: Fix SREGS documentation referenceMihai Caraman2013-01-101-1/+1
| | | | | | | | | | | | | | | | Reflect the uapi folder change in SREGS API documentation. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: trace: Fix exit decoding.Cornelia Huck2013-01-101-1/+1
| | | | | | | | | | | | | | | | trace_kvm_userspace_exit has been missing the KVM_EXIT_WATCHDOG exit. CC: Bharat Bhushan <r65777@freescale.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: MMU: fix infinite fault access retryXiao Guangrong2013-01-102-10/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have two issues in current code: - if target gfn is used as its page table, guest will refault then kvm will use small page size to map it. We need two #PF to fix its shadow page table - sometimes, say a exception is triggered during vm-exit caused by #PF (see handle_exception() in vmx.c), we remove all the shadow pages shadowed by the target gfn before go into page fault path, it will cause infinite loop: delete shadow pages shadowed by the gfn -> try to use large page size to map the gfn -> retry the access ->... To fix these, we can adjust page size early if the target gfn is used as page table Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: MMU: fix Dirty bit missed if CR0.WP = 0Xiao Guangrong2013-01-102-39/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the write-fault access is from supervisor and CR0.WP is not set on the vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte and clears U bit. This is the chance that kvm can change pte access from readonly to writable Unfortunately, the pte access is the access of 'direct' shadow page table, means direct sp.role.access = pte_access, then we will create a writable spte entry on the readonly shadow page table. It will cause Dirty bit is not tracked when two guest ptes point to the same large page. Note, it does not have other impact except Dirty bit since cr0.wp is encoded into sp.role It can be fixed by adjusting pte access before establishing shadow page table. Also, after that, no mmu specified code exists in the common function and drop two parameters in set_spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: s390: Gracefully handle busy conditions on ccw_device_startChristian Borntraeger2013-01-091-5/+8
| | | | | | | | | | | | | | | | | | | | In rare cases a virtio command might try to issue a ccw before a former ccw was answered with a tsch. This will cause CC=2 (busy). Lets just retry in that case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: s390: Dynamic allocation of virtio-ccw I/O data.Cornelia Huck2013-01-091-106/+174
| | | | | | | | | | | | | | | | | | | | Dynamically allocate any data structures like ccw used when doing channel I/O. Otherwise, we'd need to add extra serialization for the different callbacks using the same data structures. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
OpenPOWER on IntegriCloud