| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
r328083,328096,328116,328119,328120,328128,328135,328153,328157,""
This reverts commit d3d59b01294138e59995b31d2bcbbbdf45e26a3c.
|
|
|
|
| |
This reverts commit 5919c0a9658dde48bd090704915aa3a85a6c0d26.
|
|
|
|
| |
This reverts commit 2589da26b930eaf9441b6bf27c0f410062adf507.
|
|
|
|
| |
This reverts commit 430a2bea3907149b30cc75fc722b6cf1f81da82a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
328166,328177,328199,328202,328205,328468,328470,328624,328625,328627,
328628,329214,329297,329365:
Meltdown mitigation by PTI, PCID optimization of PTI, and kernel use of IBRS
for some mitigations of Spectre.
Tested by: emaste, Arshan Khanifar <arshankhanifar@gmail.com>
Discussed with: jkim
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6dd025b40ee6870bea6ba670f30dcf684edc3f6c)
|
|
|
|
|
|
| |
Make WRFSBASE and WRGSBASE instructions functional.
(cherry picked from commit b1a7a7418e73251aad628dc4f9418e550a9fd3d7)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r315361
Add the AMD MONITORX/MWAITX feature definition introduced in
Bulldozer/Ryzen CPUs.
r315364
Hide the AMD MONITORX/MWAITX capability.
Otherwise, recent Linux guests will use these instructions, resulting
in #UD exceptions since bhyve doesn't implement MONITOR/MWAIT exits.
This fixes boot-time hangs in recent Linux guests on Ryzen CPUs
(and probably Bulldozer aka AMD FX as well).
|
| |
|
|
|
|
| |
must be contiguous in physical memory
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Sponsored by: Chelsio Communications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
304858:
Enable I/O MMU when PCI pass through is first used.
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest. If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.
The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot. However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.
305485:
Leave ppt devices in the host domain when they are not attached to a VM.
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host. ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.
305497:
Update the I/O MMU in bhyve when PCI devices are added and removed.
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Sponsored by: Chelsio Communications
|
|
|
|
| |
PR: 208168
|
|
|
|
|
|
|
| |
(one manual change to fix grammar)
Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
|
|
|
|
| |
Reviewed by: grehan
|
|
|
|
|
|
|
|
| |
actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL.
Submitted by:Yamagi Burmeister
Approved by:grehan
MFC after:2 weeks
|
|
|
|
|
| |
identifiers present in vmmapi.h. In particular, it's now possible
to create a VM_FRAMEBUFFER segment.
|
|
|
|
|
|
|
| |
include it explicitly when <vm/vm.h> is already included.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D5380
|
|
|
|
|
|
|
|
| |
need to include it explicitly when <vm/vm_param.h> is already included.
Suggested by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D5379
|
|
|
|
|
|
|
| |
include it explicitly when <vm/pmap.h> is already included.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D5373
|
|
|
|
|
|
|
|
| |
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
|
|
|
|
| |
Reviewed by: neel
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously this was done by the caller of 'svm_launch()' after it returned.
This works fine as long as no code is executed in the interim that depends
on pcpu data.
The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because
it calls 'dtrace_probe()' which in turn relies on pcpu data.
Reported by: avg
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
|
|
|
|
|
|
|
|
| |
execution control and writing the difference between the host TSC and
the guest TSC into the TSC offset in the VMCS upon encountering a
write.
Reviewed by: neel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
after decoding the instruction matches the one provided by hardware.
Prior to r283293 'vie->num_valid' used to contain the actual length of
the instruction whereas now it contains the maximum instruction length
possible. This introduced a bug when calculating a RIP-relative base address.
Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the
length of the emulated instruction.
Reported and tested by: tychon
MFC after: 1 week
|
|
|
|
|
|
|
|
|
| |
though they might be available in hardware.
Use tunable 'hw.vmm.svm.num_asids' to limit the number of ASIDs used by
the hypervisor.
MFC after: 1 week
|
|
|
|
|
|
|
| |
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.
MFC after: 2 weeks
|
|
|
|
| |
MFC after: 1 week
|
|
|
|
|
|
| |
check has been bogus since r273375.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
have an accurate length on an EPT violation. This is not needed by the
instruction decoding code because it also has to work with AMD/SVM that
does not provide a valid instruction length on a Nested Page Fault.
In collaboration with: Leon Dang (ldang@nahannisys.com)
Discussed with: grehan
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
years for head. However, it is continuously misused as the mpsafe argument
for callout_init(9). Deprecate the flag and clean up callout_init() calls
to make them more consistent.
Differential Revision: https://reviews.freebsd.org/D2613
Reviewed by: jhb
MFC after: 2 weeks
|
|
|
|
|
| |
Reported and tested by: Leon Dang (ldang@nahannisys.com)
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this change both functions returned 0 for success, -1 for failure
and +1 to indicate that an exception was injected into the guest.
The numerical value of ERESTART also happens to be -1 so when these functions
returned -1 it had to be translated to a positive errno value to prevent the
VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce
bugs when writing emulation code.
Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if
an exception was delivered to the guest. The return value is 0 or EFAULT so
no additional translation is needed.
Reviewed by: tychon
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D2428
|
|
|
|
|
|
|
|
|
|
| |
- Must-Be-Zero bits cannot be set.
- EFER_LME and EFER_LMA should respect the long mode consistency checks.
- EFER_NXE, EFER_FFXSR, EFER_TCE can be set if allowed by CPUID capabilities.
- Flag an error if guest tries to set EFER_LMSLE since bhyve doesn't enforce
segment limits in 64-bit mode.
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
Vista guest.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 1 week
|
|
|
|
|
| |
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 1 week
|
|
|
|
|
| |
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
| |
Do the same when transitioning a vector from the IRR to the ISR and also
when extinguishing it from the ISR in response to an EOI.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
|
|
| |
enabled.
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
Only a subset of source files that include <machine/vmm.h> need to use the
APIs that require the inclusion of <sys/cpuset.h>.
MFC after: 1 week
|
|
|
|
|
|
|
| |
can dump the instruction bytes.
Requested by: grehan
MFC after: 1 week
|
|
|
|
|
|
|
| |
This is required for booting Windows guests.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
|
|
|
|
|
|
|
|
| |
losing time.
The problem with the earlier implementation was that the uptime value
used by 'vrtc_curtime()' could be different than the uptime value when
'vrtc_time_update()' actually updated 'base_uptime'.
Fix this by calculating and updating the (rtctime, uptime) tuple together.
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
'Delivery Status' bit in APIC ICR register.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
|
|
|
| |
properly set.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
| |
Reviewed by: neel
|
|
|
|
|
| |
Differential Revision: D2342
Reviewed by: neel
|
|
|
|
|
|
|
|
| |
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
Reported by: Leon Dang (ldang@nahannisys.com)
MFC after: 2 weeks
|
|
|
|
| |
MFC after: 1 week
|
|
|
|
|
|
|
|
| |
* Implemement the 0x81 and 0x83 CMP instructions.
* Implemement the 0x83 AND instruction.
* Implemement the 0x81 OR instruction.
Reviewed by: neel
|