summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm
Commit message (Collapse)AuthorAgeFilesLines
* Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.ioneel2015-07-061-1/+1
| | | | | | | | Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual machines on the host. These tools break if the devmem device nodes also appear in /dev/vmm. Requested by: grehan
* verify_gla() needs to account for non-zero segment base addresses.tychon2015-06-261-7/+44
| | | | Reviewed by: neel
* Restore the host's GS.base before returning from 'svm_launch()'.neel2015-06-234-33/+24
| | | | | | | | | | | | Previously this was done by the caller of 'svm_launch()' after it returned. This works fine as long as no code is executed in the interim that depends on pcpu data. The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because it calls 'dtrace_probe()' which in turn relies on pcpu data. Reported by: avg MFC after: 1 week
* Restructure memory allocation in bhyve to support "devmem".neel2015-06-188-286/+649
| | | | | | | | | | | | | | | | | | | | | devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks
* Support guest writes to the TSC by enabling the "use TSC offsetting"tychon2015-06-093-4/+26
| | | | | | | | execution control and writing the difference between the host TSC and the guest TSC into the TSC offset in the VMCS upon encountering a write. Reviewed by: neel
* The 'verify_gla()' function is used to ensure that the effective addressneel2015-06-051-1/+1
| | | | | | | | | | | | | | after decoding the instruction matches the one provided by hardware. Prior to r283293 'vie->num_valid' used to contain the actual length of the instruction whereas now it contains the maximum instruction length possible. This introduced a bug when calculating a RIP-relative base address. Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the length of the emulated instruction. Reported and tested by: tychon MFC after: 1 week
* Use tunable 'hw.vmm.svm.features' to disable specific SVM features evenneel2015-06-041-5/+10
| | | | | | | | | though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC after: 1 week
* Fix non-deterministic delays when accessing a vcpu that was in "running" orneel2015-05-285-28/+112
| | | | | | | "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks
* Exceptions don't deliver an error code in real mode.neel2015-05-231-0/+11
| | | | MFC after: 1 week
* Remove the verification of instruction length after instruction decode. Theneel2015-05-221-16/+0
| | | | | | check has been bogus since r273375. MFC after: 1 week
* Don't rely on the 'VM-exit instruction length' field in the VMCS to alwaysneel2015-05-222-13/+11
| | | | | | | | | | have an accurate length on an EPT violation. This is not needed by the instruction decoding code because it also has to work with AMD/SVM that does not provide a valid instruction length on a Nested Page Fault. In collaboration with: Leon Dang (ldang@nahannisys.com) Discussed with: grehan MFC after: 1 week
* CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than tenjkim2015-05-221-1/+1
| | | | | | | | | | years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks
* Emulate the "CMP r/m, reg" instruction (opcode 39H).neel2015-05-211-6/+22
| | | | | Reported and tested by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().neel2015-05-063-84/+76
| | | | | | | | | | | | | | | | | | Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428
* Do a proper emulation of guest writes to MSR_EFER.neel2015-05-063-14/+128
| | | | | | | | | | - Must-Be-Zero bits cannot be set. - EFER_LME and EFER_LMA should respect the long mode consistency checks. - EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities. - Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce segment limits in 64-bit mode. MFC after: 2 weeks
* Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windowsneel2015-05-041-2/+14
| | | | | | | Vista guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Don't advertise the Intel SMX capability to the guest.neel2015-05-021-1/+2
| | | | | Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Emulate machine check related MSRs to allow guest OSes like Windows to boot.neel2015-05-023-7/+24
| | | | | Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* r281630 relaxed the limits on the vectors that can be asserted in the IRRs.neel2015-05-011-11/+9
| | | | | | | | Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs areneel2015-05-011-0/+2
| | | | | | enabled. MFC after: 2 weeks
* Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.neel2015-04-3013-18/+0
| | | | | | | Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week
* When an instruction cannot be decoded just return to userspace so bhyve(8)neel2015-04-301-2/+6
| | | | | | | can dump the instruction bytes. Requested by: grehan MFC after: 1 week
* Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.neel2015-04-303-3/+38
| | | | | | | This is required for booting Windows guests. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Re-implement RTC current time calculation to eliminate the possibility ofneel2015-04-291-21/+32
| | | | | | | | | | | | losing time. The problem with the earlier implementation was that the uptime value used by 'vrtc_curtime()' could be different than the uptime value when 'vrtc_time_update()' actually updated 'base_uptime'. Fix this by calculating and updating the (rtctime, uptime) tuple together. MFC after: 2 weeks
* Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check theneel2015-04-291-0/+52
| | | | | | | 'Delivery Status' bit in APIC ICR register. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Implement the century byte in the RTC. Some guests require this field to beneel2015-04-281-22/+44
| | | | | | | properly set. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.tychon2015-04-251-0/+77
| | | | Reviewed by: neel
* Missing break in switch case.araujo2015-04-231-0/+1
| | | | | Differential Revision: D2342 Reviewed by: neel
* Relax the check on which vectors can be delivered through the APIC. Accordingneel2015-04-161-1/+5
| | | | | | | | to the Intel SDM vectors 16 through 255 are allowed to be delivered via the local APIC. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.neel2015-04-161-1/+1
| | | | MFC after: 1 week
* Enhance the support for Group 1 Extended opcodes:tychon2015-04-061-38/+84
| | | | | | | | * Implemement the 0x81 and 0x83 CMP instructions. * Implemement the 0x83 AND instruction. * Implemement the 0x81 OR instruction. Reviewed by: neel
* Fix "MOVS" instruction memory to MMIO emulation. Currently updates totychon2015-04-013-34/+53
| | | | | | | | | %rdi, %rsi, etc are inadvertently bypassed along with the check to see if the instruction needs to be repeated per the 'rep' prefix. Add "MOVS" instruction support for the 'MMIO to MMIO' case. Reviewed by: neel
* Fix the RTC device model to operate correctly in 12-hour mode. The followingneel2015-03-281-6/+41
| | | | | | | | | | | | | table documents the values in the RTC 'hour' field in the two modes: Hour-of-the-day 12-hour mode 24-hour mode 12 AM 12 0 [1-11] AM [1-11] [1-11] 12 PM 0x80 | 12 12 [1-11] PM 0x80 | [1-11] [13-23] Reported by: Julian Hsiao (madoka@nyanisore.net) MFC after: 1 week
* When fetching an instruction in non-64bit mode, consider the value of thetychon2015-03-244-6/+19
| | | | | | | | | code segment base address. Also if an instruction doesn't support a mod R/M (modRM) byte, don't be concerned if the CPU is in real mode. Reviewed by: neel
* Report ARAT (APIC-Timer-always-running) feature for virtual CPU.mav2015-03-161-0/+6
| | | | | | | | | | | | | This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET due to worries about non-existing for virtual CPUs deep sleep states. Benchmarks of usleep(1) on guest and host show such extra latencies: - 51us for virtual HPET, - 22us for virtual LAPIC timer, - 22us for host HPET and - 3us for host LAPIC timer. MFC after: 2 weeks
* Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve whenneel2015-03-148-184/+8
| | | | | | | | vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib
* When ICW1 is issued the edge sense circuit is reset which means thattychon2015-03-061-0/+1
| | | | | | | following an initialization a low-to-high transistion is necesary to generate an interrupt. Reviewed by: neel
* Fix warnings/errors when building vmm.ko with gcc:neel2015-03-022-6/+12
| | | | | | | | | | | | | | | - fix warning about comparison of 'uint8_t v_tpr >= 0' always being true. - fix error triggered by an empty clobber list in the inline assembly for "clgi" and "stgi" - fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The gcc assembler does not like the explicit operand "%rax" while the clang assembler requires specifying the operand "%rax". Fix this by encoding the instructions using the ".byte" directive. Reported by: julian MFC after: 1 week
* Allow passthrough devices to be hinted.rstone2015-03-012-34/+51
| | | | | | | | | | | | | | | | Allow the ppt driver to attach to devices that were hinted to be passthrough devices by the PCI code creating them with a driver name of "ppt". Add a tunable that allows the IOMMU to be forced to be used. With SR-IOV passthrough devices the VFs may be created after vmm.ko is loaded. The current code will not initialize the IOMMU in that case, meaning that the passthrough devices can't actually be used. Differential Revision: https://reviews.freebsd.org/D73 Reviewed by: neel MFC after: 1 month Sponsored by: Sandvine Inc.
* Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restoreneel2015-02-244-22/+56
| | | | | | | | | | | | | capability of VT-x. This lets bhyve run nested in older VMware versions that don't support the PAT save/restore capability. Note that the actual value programmed by the guest in MSR_PAT is irrelevant because bhyve sets the 'Ignore PAT' bit in the nested PTE. Reported by: marcel Tested by: Leon Dang (ldang@nahannisys.com) Sponsored by: Nahanni Systems MFC after: 2 weeks
* Add x2APIC support. Enable it by default if CPU is capable. Thekib2015-02-091-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months
* Add macro to identify AVIC capability (advanced virtual interrupt controller)neel2015-01-241-0/+1
| | | | | | in AMD processors. Submitted by: Dmitry Luhtionov (dmitryluhtionov@gmail.com)
* MOVS instruction emulation.neel2015-01-191-4/+267
| | | | | | | | | | | | These instructions are emitted by 'bus_space_read_region()' when accessing MMIO regions. Since MOVS can be used with a repeat prefix start decoding the REPZ and REPNZ prefixes. Also start decoding the segment override prefix since MOVS allows overriding the source operand segment register. Tested by: tychon MFC after: 1 week
* Simplify instruction restart logic in bhyve.neel2015-01-183-16/+62
| | | | | | | | | | | | | | | | | | | | | | Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks
* Fix typo (missing comma).neel2015-01-141-1/+1
| | | | MFC after: 3 days
* 'struct vm_exception' was intended to be used only as the collateral for theneel2015-01-134-51/+54
| | | | | | | | | | | | | | | | VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week
* Clear blocking due to STI or MOV SS in the hypervisor when an instruction isneel2015-01-065-27/+55
| | | | | | | | | | | emulated or when the vcpu incurs an exception. This matches the CPU behavior. Remove special case code in HLT processing that was clearing the interrupt shadow. This is now redundant because the interrupt shadow is always cleared when the vcpu is resumed after an instruction is emulated. Reported by: David Reed (david.reed@tidalscale.com) MFC after: 2 weeks
* Initialize all fields of 'struct vm_exception exception' before passing it toneel2014-12-301-2/+5
| | | | | | | | | | | | vm_inject_exception(). This fixes the issue that 'exception.cpuid' is uninitialized when calling 'vm_inject_exception()'. However, in practice this change is a no-op because vm_inject_exception() does not use 'exception.cpuid' for anything. Reported by: Coverity Scan CID: 1261297 MFC after: 3 days
* Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.neel2014-12-306-61/+1051
| | | | | | | | | | | | | | | | | | | | | The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks
* Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' onneel2014-12-302-0/+15
| | | | | | an AMD/SVM host. MFC after: 1 week
OpenPOWER on IntegriCloud