summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm/intel
Commit message (Collapse)AuthorAgeFilesLines
...
| * If VMX isn't enabled so long as the lock bit isn't set yet in MSRtychon2014-05-301-1/+10
| | | | | | | | | | | | IA32_FEATURE_CONTROL it still can be. Approved by: grehan (co-mentor)
| * - Rework the XSAVE/XRSTOR emulation to only expose XCR0 features to thejhb2014-05-271-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guest for which the rules regarding xsetbv emulation are known. In particular future extensions like AVX-512 have interdependencies among feature bits that could allow a guest to trigger a GP# in the host with the current approach of allowing anything the host supports. - Add proper checking of Intel MPX and AVX-512 XSAVE features in the xsetbv emulation and allow these features to be exposed to the guest if they are enabled in the host. - Expose a subset of known-safe features from leaf 0 of the structured extended features to guests if they are supported on the host including RDFSBASE/RDGSBASE, BMI1/2, AVX2, AVX-512, HLE, ERMS, and RTM. Aside from AVX-512, these features are all new instructions available for use in ring 3 with no additional hypervisor changes needed. Reviewed by: neel
* | ins/outs support for SVM. Modelled on the Intel VT-x code.grehan2014-06-061-2/+0
|/ | | | | | | | | Remove CR2 save/restore - the guest restore/save is done in hardware, and there is no need to save/restore the host version (same as VT-x). Submitted by: neel (SVM segment descriptor 'P' bit code) Reviewed by: neel
* Do the linear address calculation for the ins/outs emulation using a newneel2014-05-251-1/+0
| | | | | | | API function 'vie_calculate_gla()'. While the current implementation is simplistic it forms the basis of doing segmentation checks if the guest is in 32-bit protected mode.
* Consolidate all the information needed by the guest page table walker intoneel2014-05-241-10/+14
| | | | | | | | | | 'struct vm_guest_paging'. Check for canonical addressing in vmm_gla2gpa() and inject a protection fault into the guest if a violation is detected. If the page table walk is restarted in vmm_gla2gpa() then reset 'ptpphys' to point to the root of the page tables.
* When injecting a page fault into the guest also update the guest's %cr2 toneel2014-05-241-0/+2
| | | | | | | | | | indicate the faulting linear address. If the guest PML4 entry has the PG_PS bit set then inject a page fault into the guest with the PGEX_RSV bit set in the error_code. Get rid of redundant checks for the PG_RW violations when walking the page tables.
* Add emulation of the "outsb" instruction. NetBSD guests use this to write toneel2014-05-231-8/+99
| | | | | | | | | | | | the UART FIFO. The emulation is constrained in a number of ways: 64-bit only, doesn't check for all exception conditions, limited to i/o ports emulated in userspace. Some of these constraints will be relaxed in followup commits. Requested by: grehan Reviewed by: tychon (partially and a much earlier version)
* Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in theneel2014-05-223-10/+24
| | | | | VCPU_RUNNING state. This will let the VMX exit handler inspect the vcpu's segment descriptors without having to exit the critical section.
* Add PG_U (user/supervisor) checks when translating a guest linear addressneel2014-05-191-12/+27
| | | | | | | | | to a guest physical address. PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now checked only in non-terminal paging entries. Ignore the upper 32-bits of the CR3 for PAE paging.
* Make the vmx asm code dtrace-fbt-friendly bygrehan2014-05-182-7/+19
| | | | | | | | | | - inserting frame enter/leave sequences - restructuring the vmx_enter_guest routine so that it subsumes the vm_exit_guest block, which was the #vmexit RIP and not a callable routine. Reviewed by: neel MFC after: 3 weeks
* Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest.neel2014-04-301-0/+3
| | | | Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
* Allow a virtual machine to be forcibly reset or powered off. This is doneneel2014-04-281-11/+2
| | | | | | | | | | | | | by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF. The disposition of VM_SUSPEND is also made available to the exit handler via the 'u.suspended' member of 'struct vm_exit'. This capability is exposed via the '--force-reset' and '--force-poweroff' arguments to /usr/sbin/bhyvectl. Discussed with: grehan@
* A VMCS is always inactive when it exits the vmx_run() loop.neel2014-04-261-8/+1
| | | | | | Remove redundant code and the misleading comment that suggest otherwise. Reviewed by: grehan@
* Allow the guest to read the TSC via MSR 0x10.grehan2014-04-241-1/+7
| | | | | | | NetBSD/amd64 does this, as does Linux on AMD CPUs. Reviewed by: neel MFC after: 3 weeks
* There is no need to save and restore the host's return address in theneel2014-04-113-11/+5
| | | | | | | 'struct vmxctx'. It is preserved on the host stack across a guest entry and exit and just restoring the host's '%rsp' is sufficient. Pointed out by: grehan@
* Rework r264179.grehan2014-04-101-5/+19
| | | | | | | | | | | | | - remove redundant code - remove erroneous setting of the error return in vmmdev_ioctl() - use style(9) initialization - in vmx_inject_pir(), document the race condition that the final conditional statement was detecting, Tested with both gcc and clang builds. Reviewed by: neel
* Make the vmm code compile with gcc too. Not entirely sure things areimp2014-04-051-1/+7
| | | | | | | correct for the pirbase test (since I'd have thought we'd need to do something even when the offset is 0 and that test looks like a misguided attempt to not use an uninitialized variable), but it is at least the same as today.
* Re-write bhyve's I/O MMU handling in terms of PCI RID.rstone2014-04-011-16/+12
| | | | | | Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc.
* Revert PCI RID changes.rstone2014-04-011-12/+16
| | | | | | | | My PCI RID changes somehow got intermixed with my PCI ARI patch when I committed it. I may have accidentally applied a patch to a non-clean working tree. Revert everything while I figure out what went wrong. Pointy hat to: rstone
* Re-write bhyve's I/O MMU handling in terms of PCI RIDsrstone2014-04-011-16/+12
| | | | | Reviewed by: neel Sponsored by: Sandvine Inc
* Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be calledneel2014-03-261-3/+19
| | | | | | | | | | | | from any context i.e., it is not required to be called from a vcpu thread. The ioctl simply sets a state variable 'vm->suspend' to '1' and returns. The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The suspend handler waits until all 'vm->active_cpus' have transitioned to 'vm->suspended_cpus' before returning to userspace. Discussed with: grehan
* Fix a race wherein the source of an interrupt vector is wronglytychon2014-03-151-7/+29
| | | | | | | | | | | | | attributed if an ExtINT arrives during interrupt injection. Also, fix a spurious interrupt if the PIC tries to raise an interrupt before the outstanding one is accepted. Finally, improve the PIC interrupt latency when another interrupt is raised immediately after the outstanding one is accepted by creating a vmexit rather than waiting for one to occur by happenstance. Approved by: neel (co-mentor)
* Replace the userspace atpic stub with a more functional vmm.ko model.tychon2014-03-111-0/+6
| | | | | | | | New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)
* Correct VMware capitalization.jhb2014-02-281-1/+1
| | | | Submitted by: joeld
* Workaround an apparent bug in VMWare Fusion's nested VT support where itjhb2014-02-281-0/+7
| | | | | | | | | triggers a VM exit with the exit reason of an external interrupt but without a valid interrupt set in the exit interrupt information. Tested by: Michael Dexter Reviewed by: neel MFC after: 1 week
* Queue pending exceptions in the 'struct vcpu' instead of directly updating theneel2014-02-262-123/+24
| | | | | | | | | | | | | | | | | | | | | | | processor-specific VMCS or VMCB. The pending exception will be delivered right before entering the guest. The order of event injection into the guest is: - hardware exception - NMI - maskable interrupt In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt window-exiting and inject it as soon as possible after the hardware exception is injected. Also since interrupts are inherently asynchronous, injecting them after the hardware exception should not affect correctness from the guest perspective. Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict it to only deliver x86 hardware exceptions. This new ioctl is now used to inject a protection fault when the guest accesses an unimplemented MSR. Discussed with: grehan, jhb Reviewed by: jhb
* Add support for x2APIC virtualization assist in Intel VT-x.neel2014-02-211-10/+142
| | | | | | | | | | | | | | | | | | The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode is switched to x2APIC. The VT-x implementation of this handler turns off the APIC-access virtualization and enables the x2APIC virtualization in the VMCS. The x2APIC virtualization is done by allowing guest read access to a subset of MSRs in the x2APIC range. In non-root operation the processor will satisfy an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead. The guest is also given write access to TPR, EOI and SELF_IPI MSRs which get special treatment in non-root operation. This is documented in the Intel SDM section titled "Virtualizing MSR-Based APIC Accesses". Enforce that APIC-write and APIC-access VM-exits are handled only if APIC-access virtualization is enabled. The one exception to this is SELF_IPI virtualization which may result in an APIC-write VM-exit.
* A first pass at adding support for injecting hardware exceptions forjhb2014-02-183-35/+130
| | | | | | | | | | | | | | | | emulated instructions. - Add helper routines to inject interrupt information for a hardware exception from the VM exit callback routines. - Use the new routines to inject GP and UD exceptions for invalid operations when emulating the xsetbv instruction. - Don't directly manipulate the entry interrupt info when a user event is injected. Instead, store the event info in the vmx state and only apply it during a VM entry if a hardware exception or NMI is not already pending. - While here, use HANDLED/UNHANDLED instead of 1/0 in a couple of routines. Reviewed by: neel
* Add virtualized XSAVE support to bhyve which permits guests to use XSAVE andjhb2014-02-081-0/+37
| | | | | | | | | | | | | | | XSAVE-enabled features like AVX. - Store a per-cpu guest xcr0 register. When switching to the guest FPU state, switch to the guest xcr0 value. Note that the guest FPU state is saved and restored using the host's xcr0 value and xcr0 is saved/restored "inside" of saving/restoring the guest FPU state. - Handle VM exits for the xsetbv instruction by updating the guest xcr0. - Expose the XSAVE feature to the guest only if the host has enabled XSAVE, and only advertise XSAVE features enabled by the host to the guest. This ensures that the guest will only adjust FPU state that is a subset of the guest FPU state saved and restored by the host. Reviewed by: grehan
* Add a counter to differentiate between VM-exits due to nested paging faultsneel2014-02-081-1/+2
| | | | and instruction emulation faults.
* Fix a bug in the handling of VM-exits caused by non-maskable interrupts (NMI).neel2014-02-081-15/+36
| | | | | | | | | | | | | | | If a VM-exit is caused by an NMI then "blocking by NMI" is in effect on the CPU when the VM-exit is completed. No more NMIs will be recognized until the execution of an "iret". Prior to this change the NMI handler was dispatched via a software interrupt with interrupts enabled. This meant that an interrupt could be recognized by the processor before the NMI handler completed its execution. The "iret" issued by the interrupt handler would then cause the "blocking by NMI" to be cleared prematurely. This is now fixed by handling the NMI with interrupts disabled in addition to "blocking by NMI" already established by the VM-exit.
* Add support for FreeBSD/i386 guests under bhyve.jhb2014-02-051-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Similar to the hack for bootinfo32.c in userboot, define _MACHINE_ELF_WANT_32BIT in the load_elf32 file handlers in userboot. This allows userboot to load 32-bit kernels and modules. - Copy the SMAP generation code out of bootinfo64.c and into its own file so it can be shared with bootinfo32.c to pass an SMAP to the i386 kernel. - Use uint32_t instead of u_long when aligning module metadata in bootinfo32.c in userboot, as otherwise the metadata used 64-bit alignment which corrupted the layout. - Populate the basemem and extmem members of the bootinfo struct passed to 32-bit kernels. - Fix the 32-bit stack in userboot to start at the top of the stack instead of the bottom so that there is room to grow before the kernel switches to its own stack. - Push a fake return address onto the 32-bit stack in addition to the arguments normally passed to exec() in the loader. This return address is needed to convince recover_bootinfo() in the 32-bit locore code that it is being invoked from a "new" boot block. - Add a routine to libvmmapi to setup a 32-bit flat mode register state including a GDT and TSS that is able to start the i386 kernel and update bhyveload to use it when booting an i386 kernel. - Use the guest register state to determine the CPU's current instruction mode (32-bit vs 64-bit) and paging mode (flat, 32-bit, PAE, or long mode) in the instruction emulation code. Update the gla2gpa() routine used when fetching instructions to handle flat mode, 32-bit paging, and PAE paging in addition to long mode paging. Don't look for a REX prefix when the CPU is in 32-bit mode, and use the detected mode to enable the existing 32-bit mode code when decoding the mod r/m byte. Reviewed by: grehan, neel MFC after: 1 month
* Avoid doing unnecessary nested TLB invalidations.neel2014-02-044-31/+39
| | | | | | | | | | | | | | | | | Prior to this change the cached value of 'pm_eptgen' was tracked per-vcpu and per-hostcpu. In the degenerate case where 'N' vcpus were sharing a single hostcpu this could result in 'N - 1' unnecessary TLB invalidations. Since an 'invept' invalidates mappings for all VPIDs the first 'invept' is sufficient. Fix this by moving the 'eptgen[MAXCPU]' array from 'vmxctx' to 'struct vmx'. If it is known that an 'invept' is going to be done before entering the guest then it is safe to skip the 'invvpid'. The stat VPU_INVVPID_SAVED counts the number of 'invvpid' invalidations that were avoided because they were subsumed by an 'invept'. Discussed with: grehan
* Support level triggered interrupts with VT-x virtual interrupt delivery.neel2014-01-252-3/+37
| | | | | | | | | | | The VMCS field EOI_bitmap[] is an array of 256 bits - one for each vector. If a bit is set to '1' in the EOI_bitmap[] then the processor will trigger an EOI-induced VM-exit when it is doing EOI virtualization. The EOI-induced VM-exit results in the EOI being forwarded to the vioapic so that level triggered interrupts can be properly handled. Tested by: Anish Gupta (akgupt3@gmail.com)
* Set "Interrupt Window Exiting" in the case where there is a vector to beneel2014-01-231-9/+28
| | | | | | | injected into the vcpu but the VM-entry interruption information field already has the valid bit set. Pointed out by: David Reed (david.reed@tidalscale.com)
* Handle a VM-exit due to a NMI properly by vectoring to the host's NMI handlerneel2014-01-221-0/+20
| | | | | | | via a software interrupt. This is safe to do because the logical processor is already cognizant of the NMI and further NMIs are blocked until the host's NMI handler executes "iret".
* Some processor's don't allow NMI injection if the STI_BLOCKING bit is set inneel2014-01-181-69/+80
| | | | | | | | | the Guest Interruptibility-state field. However, there isn't any way to figure out which processors have this requirement. So, inject a pending NMI only if NMI_BLOCKING, MOVSS_BLOCKING, STI_BLOCKING are all clear. If any of these bits are set then enable "NMI window exiting" and inject the NMI in the VM-exit handler.
* If the guest exits due to a fault while it is executing IRET then restoreneel2014-01-182-4/+68
| | | | | the state of "Virtual NMI blocking" in the guest's interruptibility-state field before resuming the guest.
* If a VM-exit happens during an NMI injection then clear the "NMI Blocking" bitneel2014-01-172-12/+26
| | | | | | | | | | in the Guest Interruptibility-state VMCS field. If we fail to do this then a subsequent VM-entry will fail because it is an error to inject an NMI into the guest while "NMI Blocking" is turned on. This is described in "Checks on Guest Non-Register State" in the Intel SDM. Submitted by: David Reed (david.reed@tidalscale.com)
* Add an API to rendezvous all active vcpus in a virtual machine. The rendezvousneel2014-01-141-5/+26
| | | | | | | | | | | | | | | can be initiated in the context of a vcpu thread or from the bhyve(8) control process. The first use of this functionality is to update the vlapic trigger-mode register when the IOAPIC pin configuration is changed. Prior to this change we would update the TMR in the virtual-APIC page at the time of interrupt delivery. But this doesn't work with Posted Interrupts because there is no way to program the EOI_exit_bitmap[] in the VMCS of the target at the time of interrupt delivery. Discussed with: grehan@
* Enable "Posted Interrupt Processing" if supported by the CPU. This lets usneel2014-01-113-14/+72
| | | | | | | | | | | | | inject interrupts into the guest without causing a VM-exit. This feature can be disabled by setting the tunable "hw.vmm.vmx.use_apic_pir" to "0". The following sysctls provide information about this feature: - hw.vmm.vmx.posted_interrupts (0 if disabled, 1 if enabled) - hw.vmm.vmx.posted_interrupt_vector (vector number used for vcpu notification) Tested on a Intel Xeon E5-2620v2 courtesy of Allan Jude at ScaleEngine.
* Enable the "Acknowledge Interrupt on VM exit" VM-exit control.neel2014-01-116-13/+67
| | | | | | | | | | This control is needed to enable "Posted Interrupts" and is present in all the Intel VT-x implementations supported by bhyve so enable it as the default. With this VM-exit control enabled the processor will acknowledge the APIC and store the vector number in the "VM-Exit Interruption Information" field. We now call the interrupt handler "by hand" through the IDT entry associated with the vector.
* Don't expose 'vmm_ipinum' as a global.neel2014-01-093-5/+5
|
* Use the 'Virtual Interrupt Delivery' feature of Intel VT-x if supported byneel2014-01-073-19/+440
| | | | | | | | | | | | | | | | | | | | | | hardware. It is possible to turn this feature off and fall back to software emulation of the APIC by setting the tunable hw.vmm.vmx.use_apic_vid to 0. We now start handling two new types of VM-exits: APIC-access: This is a fault-like VM-exit and is triggered when the APIC register access is not accelerated (e.g. apic timer CCR). In response to this we do emulate the instruction that triggered the APIC-access exit. APIC-write: This is a trap-like VM-exit which does not require any instruction emulation but it does require the hypervisor to emulate the access to the specified register (e.g. icrlo register). Introduce 'vlapic_ops' which are function pointers to vector the various vlapic operations into processor-dependent code. The 'Virtual Interrupt Delivery' feature installs 'ops' for setting the IRR bits in the virtual APIC page and to return whether any interrupts are pending for this vcpu. Tested on an "Intel Xeon E5-2620 v2" courtesy of Allan Jude at ScaleEngine.
* Fix a bug introduced in r260167 related to VM-exit tracing.neel2014-01-071-10/+11
| | | | | | Keep a copy of the 'rip' and the 'exit_reason' and use that when calling vmx_exit_trace(). This is because both the 'rip' and 'exit_reason' can be changed by 'vmx_exit_process()' and can lead to very misleading traces.
* Allow vlapic_set_intr_ready() to return a value that indicates whether or notneel2014-01-071-2/+1
| | | | | | | | | | | the vcpu should be kicked to process a pending interrupt. This will be useful in the implementation of the Posted Interrupt APICv feature. Change the return value of 'vlapic_pending_intr()' to indicate whether or not an interrupt is available to be delivered to the vcpu depending on the value of the PPR. Add KTR tracepoints to debug guest IPI delivery.
* Split the VMCS setup between 'vmcs_init()' that does initialization andneel2014-01-063-64/+27
| | | | | | | | | 'vmx_vminit()' that does customization. This makes it easier to turn on optional features (e.g. APICv) without having to keep adding new parameters to 'vmcs_set_defaults()'. Reviewed by: grehan@
* Use the same label name for ENTRY() and END() macros for 'vmx_enter_guest'.neel2014-01-031-1/+1
| | | | Pointed out by: rmh@
* Restructure the VMX code to enter and exit the guest. In large part this changeneel2014-01-014-410/+221
| | | | | | | | | | | | | hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used to enter guest context and vmx_exit_guest() is used to transition back into host context. Fix a longstanding race where a vcpu interrupt notification might be ignored if it happens after vmx_inject_interrupts() but before host interrupts are disabled in vmx_resume/vmx_launch. We now called vmx_inject_interrupts() with host interrupts disabled to prevent this. Suggested by: grehan@
* In sys/amd64/vmm/intel/vmx.c, silence a (incorrect) gcc warning aboutdim2013-12-271-0/+1
| | | | | | regval possibly being used uninitialized. Reviewed by: neel
OpenPOWER on IntegriCloud