summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix warnings/errors when building vmm.ko with gcc:neel2015-03-022-6/+12
| | | | | | | | | | | | | | | - fix warning about comparison of 'uint8_t v_tpr >= 0' always being true. - fix error triggered by an empty clobber list in the inline assembly for "clgi" and "stgi" - fix error when compiling "vmload %rax", "vmrun %rax" and "vmsave %rax". The gcc assembler does not like the explicit operand "%rax" while the clang assembler requires specifying the operand "%rax". Fix this by encoding the instructions using the ".byte" directive. Reported by: julian MFC after: 1 week
* Allow passthrough devices to be hinted.rstone2015-03-012-34/+51
| | | | | | | | | | | | | | | | Allow the ppt driver to attach to devices that were hinted to be passthrough devices by the PCI code creating them with a driver name of "ppt". Add a tunable that allows the IOMMU to be forced to be used. With SR-IOV passthrough devices the VFs may be created after vmm.ko is loaded. The current code will not initialize the IOMMU in that case, meaning that the passthrough devices can't actually be used. Differential Revision: https://reviews.freebsd.org/D73 Reviewed by: neel MFC after: 1 month Sponsored by: Sandvine Inc.
* Always emulate MSR_PAT on Intel processors and don't rely on PAT save/restoreneel2015-02-244-22/+56
| | | | | | | | | | | | | capability of VT-x. This lets bhyve run nested in older VMware versions that don't support the PAT save/restore capability. Note that the actual value programmed by the guest in MSR_PAT is irrelevant because bhyve sets the 'Ignore PAT' bit in the nested PTE. Reported by: marcel Tested by: Leon Dang (ldang@nahannisys.com) Sponsored by: Nahanni Systems MFC after: 2 weeks
* Add x2APIC support. Enable it by default if CPU is capable. Thekib2015-02-091-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months
* Add macro to identify AVIC capability (advanced virtual interrupt controller)neel2015-01-241-0/+1
| | | | | | in AMD processors. Submitted by: Dmitry Luhtionov (dmitryluhtionov@gmail.com)
* MOVS instruction emulation.neel2015-01-191-4/+267
| | | | | | | | | | | | These instructions are emitted by 'bus_space_read_region()' when accessing MMIO regions. Since MOVS can be used with a repeat prefix start decoding the REPZ and REPNZ prefixes. Also start decoding the segment override prefix since MOVS allows overriding the source operand segment register. Tested by: tychon MFC after: 1 week
* Simplify instruction restart logic in bhyve.neel2015-01-183-16/+62
| | | | | | | | | | | | | | | | | | | | | | Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks
* Fix typo (missing comma).neel2015-01-141-1/+1
| | | | MFC after: 3 days
* 'struct vm_exception' was intended to be used only as the collateral for theneel2015-01-134-51/+54
| | | | | | | | | | | | | | | | VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week
* Clear blocking due to STI or MOV SS in the hypervisor when an instruction isneel2015-01-065-27/+55
| | | | | | | | | | | emulated or when the vcpu incurs an exception. This matches the CPU behavior. Remove special case code in HLT processing that was clearing the interrupt shadow. This is now redundant because the interrupt shadow is always cleared when the vcpu is resumed after an instruction is emulated. Reported by: David Reed (david.reed@tidalscale.com) MFC after: 2 weeks
* Initialize all fields of 'struct vm_exception exception' before passing it toneel2014-12-301-2/+5
| | | | | | | | | | | | vm_inject_exception(). This fixes the issue that 'exception.cpuid' is uninitialized when calling 'vm_inject_exception()'. However, in practice this change is a no-op because vm_inject_exception() does not use 'exception.cpuid' for anything. Reported by: Coverity Scan CID: 1261297 MFC after: 3 days
* Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.neel2014-12-306-61/+1051
| | | | | | | | | | | | | | | | | | | | | The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks
* Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' onneel2014-12-302-0/+15
| | | | | | an AMD/SVM host. MFC after: 1 week
* Implement "special mask mode" in vatpic.neel2014-12-281-4/+25
| | | | | | | | | OpenBSD guests always enable "special mask mode" during boot. As a result of r275952 this is flagged as an error and the guest cannot boot. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D1384 MFC after: 1 week
* Allow ktr(4) tracing of all guest exceptions via the tunableneel2014-12-235-17/+175
| | | | | | | | | | | | | | | | "hw.vmm.trace_guest_exceptions". To enable this feature set the tunable to "1" before loading vmm.ko. Tracing the guest exceptions can be useful when debugging guest triple faults. Note that there is a performance impact when exception tracing is enabled since every exception will now trigger a VM-exit. Also, handle machine check exceptions that happen during guest execution by vectoring to the host's machine check handler via "int $18". Discussed with: grehan MFC after: 2 weeks
* Emulate writes to the IA32_MISC_ENABLE MSR.neel2014-12-201-2/+24
| | | | | | | | PR: 196093 Reported by: db Tested by: db Discussed with: grehan MFC after: 1 week
* Various 8259 device model improvements:neel2014-12-201-4/+37
| | | | | | | | | | | | - implement 8259 "polled" mode. - set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization. - report error if guest tries to enable the "special mask" mode. Differential Revision: https://reviews.freebsd.org/D1328 Reviewed by: tychon Reported by: grehan Tested by: grehan MFC after: 1 week
* Fix 8259 IRQ priority resolver.neel2014-12-171-18/+28
| | | | | | | | Initialize the 8259 such that IRQ7 is the lowest priority. Reviewed by: tychon Differential Revision: https://reviews.freebsd.org/D1322 MFC after: 1 week
* For level triggered interrupts clear the PIC IRR bit when the interrupt pinneel2014-12-161-0/+2
| | | | | | | | | is deasserted. Prior to this change each assertion on a level triggered irq pin resulted in two interrupts being delivered to the CPU. Differential Revision: https://reviews.freebsd.org/D1310 Reviewed by: tychon MFC after: 1 week
* Change the lower bound for guest vmspace allocation to 0 instead ofgrehan2014-11-231-1/+1
| | | | | | | | | | | | | | using the VM_MIN_ADDRESS constant. HardenedBSD redefines VM_MIN_ADDRESS to be 64K, which results in bhyve VM startup failing. Guest memory is always assumed to start at 0 so use the absolute value instead. Reported by: Shawn Webb, lattera at gmail com Reviewed by: neel, grehan Obtained from: Oliver Pinter via HardenedBSD https://github.com/HardenedBSD/hardenedBSD/commit/23bd719ce1e3a8cc42fc8317b1c7c6d9e74dcba0 MFC after: 1 week
* Reported by: Coverityaraujo2014-10-281-0/+1
| | | | | | | CID: 1249760 Reviewed by: neel Approved by: neel Sponsored by: QNAP Systems Inc.
* Remove bhyve SVM feature printf's now that they are available in thegrehan2014-10-271-21/+0
| | | | | | general CPU feature detection code. Reviewed by: neel
* Change the type of the first argument to the I/O emulation handlers toneel2014-10-267-16/+16
| | | | | | | | 'struct vm *'. Previously it used to be a 'void *' but there is no reason to hide the actual type from the handler. Discussed with: tychon MFC after: 1 week
* Move the ACPI PM timer emulation into vmm.ko.neel2014-10-264-0/+159
| | | | | | | | | This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan
* Don't pass the 'error' return from an I/O port handler directly to vm_run().neel2014-10-261-21/+27
| | | | | | | | | | Most I/O port handlers return -1 to signal an error. If this value is returned without modification to vm_run() then it leads to incorrect behavior because '-1' is interpreted as ERESTART at the system call level. Fix this by always returning EIO to signal an error from an I/O port handler. MFC after: 1 week
* IFC @r273214neel2014-10-202-2/+2
|
* IFC @r273206neel2014-10-192-21/+78
|\
| * Follow up to r225617. In order to maximize the re-usability of kernel codedavide2014-10-161-1/+1
| | | | | | | | | | | | | | | | in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe
| * Emulate "POP r/m".neel2014-10-141-20/+77
| | | | | | | | | | | | | | This is needed to boot OpenBSD/i386 MP kernel in bhyve. Reported by: grehan MFC after: 1 week
* | Don't advertise the "OS visible workarounds" feature in cpuid.80000001H:ECX.neel2014-10-191-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | bhyve doesn't emulate the MSRs needed to support this feature at this time. Don't expose any model-specific RAS and performance monitoring features in cpuid leaf 80000007H. Emulate a few more MSRs for AMD: TSEG base address, TSEG address mask and BIOS signature and P-state related MSRs. This eliminates all the unimplemented MSRs accessed by Linux/x86_64 kernels 2.6.32, 3.10.0 and 3.17.0.
* | Don't advertise support for the NodeID MSR since bhyve doesn't emulate it.neel2014-10-181-0/+3
| |
* | Don't advertise the Instruction Based Sampling feature because it requiresneel2014-10-171-0/+5
| | | | | | | | | | | | | | | | emulating a large number of MSRs. Ignore writes to a couple more AMD-specific MSRs and return 0 on read. This further reduces the unimplemented MSRs accessed by a Linux guest on boot.
* | Hide extended PerfCtr MSRs on AMD processors by clearing bits 23, 24 and 28 inneel2014-10-171-0/+8
| | | | | | | | | | | | | | | | | | | | CPUID.80000001H:ECX. Handle accesses to PerfCtrX and PerfEvtSelX MSRs by ignoring writes and returning 0 on reads. This further reduces the number of unimplemented MSRs hit by a Linux guest during boot.
* | Use the correct fault type (VM_PROT_EXECUTE) for an instruction fetch.neel2014-10-161-0/+2
| |
* | Fix topology enumeration issues exposed by AMD Bulldozer Family 15h processor.neel2014-10-161-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | Initialize CPUID.80000008H:ECX[7:0] with the number of logical processors in the package. This fixes a panic during early boot in NetBSD 7.0 BETA. Clear the Topology Extension feature bit from CPUID.80000001H:ECX since we don't emulate leaves 0x8000001D and 0x8000001E. This fixes a divide by zero panic in early boot in Centos 6.4. Tested on an "AMD Opteron 6320" courtesy of Ben Perrault. Reviewed by: grehan
* | Actually hide the SVM capability by clearing CPUID.80000001H:ECX[bit 3]neel2014-10-151-2/+6
| | | | | | | | | | | | after it has been initialized by cpuid_count(). Submitted by: Anish Gupta (akgupt3@gmail.com)
* | Remove extraneous comments.neel2014-10-111-22/+6
| |
* | Get rid of unused headers.neel2014-10-113-193/+126
| | | | | | | | | | | | Restrict scope of malloc types M_SVM and M_SVM_VLAPIC by making them static. Replace ERR() with KASSERT(). style(9) cleanup.
* | Get rid of unused forward declaration of 'struct svm_softc'.neel2014-10-111-2/+1
| |
* | style(9) fixes.neel2014-10-111-8/+1
| | | | | | | | Get rid of unused headers.
* | Use a consistent style for messages emitted when the module is loaded.neel2014-10-111-28/+24
| |
* | IFC @r272887neel2014-10-103-0/+112
|\ \ | |/
| * Support Intel-specific MSRs that are accessed when booting up a linux in bhyve:neel2014-10-091-0/+100
| | | | | | | | | | | | | | | | | | - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT Reviewed by: grehan MFC after: 1 week
| * Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'.neel2014-10-062-0/+12
| | | | | | | | | | | | | | | | The hypervisor hides the MONITOR/MWAIT capability by unconditionally setting CPUID.01H:ECX[3] to 0 so the guest should not expect these instructions to be present anyways. Discussed with: grehan
* | Fix bhyvectl so it works correctly on AMD/SVM hosts. Also, add command lineneel2014-10-102-0/+88
| | | | | | | | | | | | | | | | | | | | | | options to display some key VMCB fields. The set of valid options that can be passed to bhyvectl now depends on the processor type. AMD-specific options are identified by a "--vmcb" or "--avic" in the option name. Intel-specific options are identified by a "--vmcs" in the option name. Submitted by: Anish Gupta (akgupt3@gmail.com)
* | IFC @r272481neel2014-10-052-68/+29
|\ \ | |/
| * Get rid of code that dealt with the hardware not being able to save/restoreneel2014-10-021-55/+17
| | | | | | | | | | | | | | | | | | | | the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. Discussed with: grehan
| * Allow the PIC's IMR register to be read before ICW initialisation.grehan2014-09-271-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Reviewed by: tychon MFC after: 3 days
* | IFC @r272185neel2014-09-273-8/+8
|\ \ | |/
| * Add some more KTR events to help debugging.neel2014-09-202-1/+8
| |
OpenPOWER on IntegriCloud