summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Revert "MFC ↵Luiz Souza2018-02-232-2/+5
| | | | | | r328083,328096,328116,328119,328120,328128,328135,328153,328157,"" This reverts commit d3d59b01294138e59995b31d2bcbbbdf45e26a3c.
* Revert "Revert "MFC r322762, r322799, r322832, r322833:""Luiz Souza2018-02-231-1/+4
| | | | This reverts commit 5919c0a9658dde48bd090704915aa3a85a6c0d26.
* Revert "MFC r322762, r322799, r322832, r322833:"Luiz Souza2018-02-211-4/+1
| | | | This reverts commit 2589da26b930eaf9441b6bf27c0f410062adf507.
* Revert "MFC r328083,328096,328116,328119,328120,328128,328135,328153,328157,"Luiz Souza2018-02-212-5/+2
| | | | This reverts commit 430a2bea3907149b30cc75fc722b6cf1f81da82a.
* MFC r328083,328096,328116,328119,328120,328128,328135,328153,328157,kib2018-02-192-2/+5
| | | | | | | | | | | | | | 328166,328177,328199,328202,328205,328468,328470,328624,328625,328627, 328628,329214,329297,329365: Meltdown mitigation by PTI, PCID optimization of PTI, and kernel use of IBRS for some mitigations of Spectre. Tested by: emaste, Arshan Khanifar <arshankhanifar@gmail.com> Discussed with: jkim Sponsored by: The FreeBSD Foundation (cherry picked from commit 6dd025b40ee6870bea6ba670f30dcf684edc3f6c)
* MFC r322762, r322799, r322832, r322833:kib2018-02-191-1/+4
| | | | | | Make WRFSBASE and WRGSBASE instructions functional. (cherry picked from commit b1a7a7418e73251aad628dc4f9418e550a9fd3d7)
* MFC r315361 and r315364: Hide MONITORX/MWAITX from guests.grehan2017-03-251-0/+3
| | | | | | | | | | | | | | r315361 Add the AMD MONITORX/MWAITX feature definition introduced in Bulldozer/Ryzen CPUs. r315364 Hide the AMD MONITORX/MWAITX capability. Otherwise, recent Linux guests will use these instructions, resulting in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits. This fixes boot-time hangs in recent Linux guests on Ryzen CPUs (and probably Bulldozer aka AMD FX as well).
* MFC r312531: vmm_dev: work around a bogus error with gcc 6.3.0avg2017-01-301-1/+1
|
* MFC r307903,307904,308039,308050: vmm/svm: iopm_bitmap and msr_bitmapavg2016-11-081-4/+5
| | | | must be contiguous in physical memory
* MFC 305502: Reset PCI pass through devices via PCI-e FLR during VM start/end.jhb2016-09-301-0/+11
| | | | | | | | | | | | Add routines to trigger a function level reset (FLR) of a PCI-express device via the PCI-express device control register. This also includes support routines to wait for pending transactions to complete as well as calculating the maximum completion timeout permitted by a device. Change the ppt(4) driver to reset pass through devices before attaching to a VM during startup and before detaching from a VM during shutdown. Sponsored by: Chelsio Communications
* MFC 304858,305485,305497: Fix various issues with PCI pass through and VT-d.jhb2016-09-304-22/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 304858: Enable I/O MMU when PCI pass through is first used. Rather than enabling the I/O MMU when the vmm module is loaded, defer initialization until the first attempt to pass a PCI device through to a guest. If the I/O MMU fails to initialize or is not present, than fail the attempt to pass a PCI device through to a guest. The hw.vmm.force_iommu tunable has been removed since the I/O MMU is no longer enabled during boot. However, the I/O MMU support can be disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent use of the I/O MMU on any systems where it is buggy. 305485: Leave ppt devices in the host domain when they are not attached to a VM. This allows a pass through device to be reset to a normal device driver on the host and reused on the host. ppt devices are now always active in some I/O MMU domain when the I/O MMU is active, either the host domain or the domain of a VM they are attached to. 305497: Update the I/O MMU in bhyve when PCI devices are added and removed. When the I/O MMU is active in bhyve, all PCI devices need valid entries in the DMAR context tables. The I/O MMU code does a single enumeration of the available PCI devices during initialization to add all existing devices to a domain representing the host. The ppt(4) driver then moves pass through devices in and out of domains for virtual machines as needed. However, when new PCI devices were added at runtime either via SR-IOV or HotPlug, the I/O MMU tables were not updated. This change adds a new set of EVENTHANDLERS that are invoked when PCI devices are added and deleted. The I/O MMU driver in bhyve installs handlers for these events which it uses to add and remove devices to the "host" domain. Sponsored by: Chelsio Communications
* MFC 303713: Correct assertion on vcpuid argument to vm_gpa_hold().jhb2016-09-091-1/+1
| | | | PR: 208168
* Don't repeat the the word 'the'eadler2016-05-171-1/+1
| | | | | | | (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)
* vmm(4): Small spelling fixes.pfg2016-05-035-5/+5
| | | | Reviewed by: grehan
* Allow guest writes to AMD microcode update[0xc0010020] MSR without updating ↵anish2016-04-111-0/+5
| | | | | | | | actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL. Submitted by:Yamagi Burmeister Approved by:grehan MFC after:2 weeks
* Bump VM_MAX_MEMSEGS from 2 to 3 to match the number of VM segmentmarcel2016-02-261-1/+1
| | | | | identifiers present in vmmapi.h. In particular, it's now possible to create a VM_FRAMEBUFFER segment.
* As <machine/vm.h> is included from <vm/vm.h>, there is no need toskra2016-02-221-1/+0
| | | | | | | include it explicitly when <vm/vm.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5380
* As <machine/vmparam.h> is included from <vm/vm_param.h>, there is noskra2016-02-221-1/+0
| | | | | | | | need to include it explicitly when <vm/vm_param.h> is already included. Suggested by: alc Reviewed by: alc Differential Revision: https://reviews.freebsd.org/D5379
* As <machine/pmap.h> is included from <vm/pmap.h>, there is no need toskra2016-02-222-3/+0
| | | | | | | include it explicitly when <vm/pmap.h> is already included. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D5373
* Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.ioneel2015-07-061-1/+1
| | | | | | | | Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual machines on the host. These tools break if the devmem device nodes also appear in /dev/vmm. Requested by: grehan
* verify_gla() needs to account for non-zero segment base addresses.tychon2015-06-261-7/+44
| | | | Reviewed by: neel
* Restore the host's GS.base before returning from 'svm_launch()'.neel2015-06-234-33/+24
| | | | | | | | | | | | Previously this was done by the caller of 'svm_launch()' after it returned. This works fine as long as no code is executed in the interim that depends on pcpu data. The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because it calls 'dtrace_probe()' which in turn relies on pcpu data. Reported by: avg MFC after: 1 week
* Restructure memory allocation in bhyve to support "devmem".neel2015-06-188-286/+649
| | | | | | | | | | | | | | | | | | | | | devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks
* Support guest writes to the TSC by enabling the "use TSC offsetting"tychon2015-06-093-4/+26
| | | | | | | | execution control and writing the difference between the host TSC and the guest TSC into the TSC offset in the VMCS upon encountering a write. Reviewed by: neel
* The 'verify_gla()' function is used to ensure that the effective addressneel2015-06-051-1/+1
| | | | | | | | | | | | | | after decoding the instruction matches the one provided by hardware. Prior to r283293 'vie->num_valid' used to contain the actual length of the instruction whereas now it contains the maximum instruction length possible. This introduced a bug when calculating a RIP-relative base address. Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the length of the emulated instruction. Reported and tested by: tychon MFC after: 1 week
* Use tunable 'hw.vmm.svm.features' to disable specific SVM features evenneel2015-06-041-5/+10
| | | | | | | | | though they might be available in hardware. Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by the hypervisor. MFC after: 1 week
* Fix non-deterministic delays when accessing a vcpu that was in "running" orneel2015-05-285-28/+112
| | | | | | | "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks
* Exceptions don't deliver an error code in real mode.neel2015-05-231-0/+11
| | | | MFC after: 1 week
* Remove the verification of instruction length after instruction decode. Theneel2015-05-221-16/+0
| | | | | | check has been bogus since r273375. MFC after: 1 week
* Don't rely on the 'VM-exit instruction length' field in the VMCS to alwaysneel2015-05-222-13/+11
| | | | | | | | | | have an accurate length on an EPT violation. This is not needed by the instruction decoding code because it also has to work with AMD/SVM that does not provide a valid instruction length on a Nested Page Fault. In collaboration with: Leon Dang (ldang@nahannisys.com) Discussed with: grehan MFC after: 1 week
* CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than tenjkim2015-05-221-1/+1
| | | | | | | | | | years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks
* Emulate the "CMP r/m, reg" instruction (opcode 39H).neel2015-05-211-6/+22
| | | | | Reported and tested by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().neel2015-05-063-84/+76
| | | | | | | | | | | | | | | | | | Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428
* Do a proper emulation of guest writes to MSR_EFER.neel2015-05-063-14/+128
| | | | | | | | | | - Must-Be-Zero bits cannot be set. - EFER_LME and EFER_LMA should respect the long mode consistency checks. - EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities. - Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce segment limits in 64-bit mode. MFC after: 2 weeks
* Emulate the 'CMP r/m8, imm8' instruction encountered when booting a Windowsneel2015-05-041-2/+14
| | | | | | | Vista guest. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Don't advertise the Intel SMX capability to the guest.neel2015-05-021-1/+2
| | | | | Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 1 week
* Emulate machine check related MSRs to allow guest OSes like Windows to boot.neel2015-05-023-7/+24
| | | | | Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* r281630 relaxed the limits on the vectors that can be asserted in the IRRs.neel2015-05-011-11/+9
| | | | | | | | Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Emulate MSR_SYSCFG which is accessed by Linux on AMD cpus when MTRRs areneel2015-05-011-0/+2
| | | | | | enabled. MFC after: 2 weeks
* Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.neel2015-04-3013-18/+0
| | | | | | | Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week
* When an instruction cannot be decoded just return to userspace so bhyve(8)neel2015-04-301-2/+6
| | | | | | | can dump the instruction bytes. Requested by: grehan MFC after: 1 week
* Advertise the MTRR feature via CPUID and emulate the minimal set of MTRR MSRs.neel2015-04-303-3/+38
| | | | | | | This is required for booting Windows guests. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Re-implement RTC current time calculation to eliminate the possibility ofneel2015-04-291-21/+32
| | | | | | | | | | | | losing time. The problem with the earlier implementation was that the uptime value used by 'vrtc_curtime()' could be different than the uptime value when 'vrtc_time_update()' actually updated 'base_uptime'. Fix this by calculating and updating the (rtctime, uptime) tuple together. MFC after: 2 weeks
* Emulate the 'bit test' instruction. Windows 7 uses 'bit test' to check theneel2015-04-291-0/+52
| | | | | | | 'Delivery Status' bit in APIC ICR register. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Implement the century byte in the RTC. Some guests require this field to beneel2015-04-281-22/+44
| | | | | | | properly set. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.tychon2015-04-251-0/+77
| | | | Reviewed by: neel
* Missing break in switch case.araujo2015-04-231-0/+1
| | | | | Differential Revision: D2342 Reviewed by: neel
* Relax the check on which vectors can be delivered through the APIC. Accordingneel2015-04-161-1/+5
| | | | | | | | to the Intel SDM vectors 16 through 255 are allowed to be delivered via the local APIC. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.neel2015-04-161-1/+1
| | | | MFC after: 1 week
* Enhance the support for Group 1 Extended opcodes:tychon2015-04-061-38/+84
| | | | | | | | * Implemement the 0x81 and 0x83 CMP instructions. * Implemement the 0x83 AND instruction. * Implemement the 0x81 OR instruction. Reviewed by: neel
OpenPOWER on IntegriCloud