summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm/io
Commit message (Collapse)AuthorAgeFilesLines
* MFC 305502: Reset PCI pass through devices via PCI-e FLR during VM start/end.jhb2016-09-301-0/+11
| | | | | | | | | | | | Add routines to trigger a function level reset (FLR) of a PCI-express device via the PCI-express device control register. This also includes support routines to wait for pending transactions to complete as well as calculating the maximum completion timeout permitted by a device. Change the ppt(4) driver to reset pass through devices before attaching to a VM during startup and before detaching from a VM during shutdown. Sponsored by: Chelsio Communications
* MFC 304858,305485,305497: Fix various issues with PCI pass through and VT-d.jhb2016-09-303-12/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 304858: Enable I/O MMU when PCI pass through is first used. Rather than enabling the I/O MMU when the vmm module is loaded, defer initialization until the first attempt to pass a PCI device through to a guest. If the I/O MMU fails to initialize or is not present, than fail the attempt to pass a PCI device through to a guest. The hw.vmm.force_iommu tunable has been removed since the I/O MMU is no longer enabled during boot. However, the I/O MMU support can be disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent use of the I/O MMU on any systems where it is buggy. 305485: Leave ppt devices in the host domain when they are not attached to a VM. This allows a pass through device to be reset to a normal device driver on the host and reused on the host. ppt devices are now always active in some I/O MMU domain when the I/O MMU is active, either the host domain or the domain of a VM they are attached to. 305497: Update the I/O MMU in bhyve when PCI devices are added and removed. When the I/O MMU is active in bhyve, all PCI devices need valid entries in the DMAR context tables. The I/O MMU code does a single enumeration of the available PCI devices during initialization to add all existing devices to a domain representing the host. The ppt(4) driver then moves pass through devices in and out of domains for virtual machines as needed. However, when new PCI devices were added at runtime either via SR-IOV or HotPlug, the I/O MMU tables were not updated. This change adds a new set of EVENTHANDLERS that are invoked when PCI devices are added and deleted. The I/O MMU driver in bhyve installs handlers for these events which it uses to add and remove devices to the "host" domain. Sponsored by: Chelsio Communications
* Don't repeat the the word 'the'eadler2016-05-171-1/+1
| | | | | | | (one manual change to fix grammar) Confirmed With: db Approved by: secteam (not really, but this is a comment typo fix)
* vmm(4): Small spelling fixes.pfg2016-05-031-1/+1
| | | | Reviewed by: grehan
* Restructure memory allocation in bhyve to support "devmem".neel2015-06-181-5/+11
| | | | | | | | | | | | | | | | | | | | | devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks
* CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than tenjkim2015-05-221-1/+1
| | | | | | | | | | years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks
* r281630 relaxed the limits on the vectors that can be asserted in the IRRs.neel2015-05-011-11/+9
| | | | | | | | Do the same when transitioning a vector from the IRR to the ISR and also when extinguishing it from the ISR in response to an EOI. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.neel2015-04-306-6/+0
| | | | | | | Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week
* Re-implement RTC current time calculation to eliminate the possibility ofneel2015-04-291-21/+32
| | | | | | | | | | | | losing time. The problem with the earlier implementation was that the uptime value used by 'vrtc_curtime()' could be different than the uptime value when 'vrtc_time_update()' actually updated 'base_uptime'. Fix this by calculating and updating the (rtctime, uptime) tuple together. MFC after: 2 weeks
* Implement the century byte in the RTC. Some guests require this field to beneel2015-04-281-22/+44
| | | | | | | properly set. Reported by: Leon Dang (ldang@nahannisys.com) MFC after: 2 weeks
* Fix the RTC device model to operate correctly in 12-hour mode. The followingneel2015-03-281-6/+41
| | | | | | | | | | | | | table documents the values in the RTC 'hour' field in the two modes: Hour-of-the-day 12-hour mode 24-hour mode 12 AM 12 0 [1-11] AM [1-11] [1-11] 12 PM 0x80 | 12 12 [1-11] PM 0x80 | [1-11] [13-23] Reported by: Julian Hsiao (madoka@nyanisore.net) MFC after: 1 week
* Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve whenneel2015-03-141-1/+0
| | | | | | | | vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib
* When ICW1 is issued the edge sense circuit is reset which means thattychon2015-03-061-0/+1
| | | | | | | following an initialization a low-to-high transistion is necesary to generate an interrupt. Reviewed by: neel
* Allow passthrough devices to be hinted.rstone2015-03-011-33/+45
| | | | | | | | | | | | | | | | Allow the ppt driver to attach to devices that were hinted to be passthrough devices by the PCI code creating them with a driver name of "ppt". Add a tunable that allows the IOMMU to be forced to be used. With SR-IOV passthrough devices the VFs may be created after vmm.ko is loaded. The current code will not initialize the IOMMU in that case, meaning that the passthrough devices can't actually be used. Differential Revision: https://reviews.freebsd.org/D73 Reviewed by: neel MFC after: 1 month Sponsored by: Sandvine Inc.
* Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.neel2014-12-303-61/+1011
| | | | | | | | | | | | | | | | | | | | | The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks
* Implement "special mask mode" in vatpic.neel2014-12-281-4/+25
| | | | | | | | | OpenBSD guests always enable "special mask mode" during boot. As a result of r275952 this is flagged as an error and the guest cannot boot. Reviewed by: grehan Differential Revision: https://reviews.freebsd.org/D1384 MFC after: 1 week
* Various 8259 device model improvements:neel2014-12-201-4/+37
| | | | | | | | | | | | - implement 8259 "polled" mode. - set 'atpic->sfn' if bit 4 in ICW4 is set during master initialization. - report error if guest tries to enable the "special mask" mode. Differential Revision: https://reviews.freebsd.org/D1328 Reviewed by: tychon Reported by: grehan Tested by: grehan MFC after: 1 week
* Fix 8259 IRQ priority resolver.neel2014-12-171-18/+28
| | | | | | | | Initialize the 8259 such that IRQ7 is the lowest priority. Reviewed by: tychon Differential Revision: https://reviews.freebsd.org/D1322 MFC after: 1 week
* For level triggered interrupts clear the PIC IRR bit when the interrupt pinneel2014-12-161-0/+2
| | | | | | | | | is deasserted. Prior to this change each assertion on a level triggered irq pin resulted in two interrupts being delivered to the CPU. Differential Revision: https://reviews.freebsd.org/D1310 Reviewed by: tychon MFC after: 1 week
* Change the type of the first argument to the I/O emulation handlers toneel2014-10-266-15/+15
| | | | | | | | 'struct vm *'. Previously it used to be a 'void *' but there is no reason to hide the actual type from the handler. Discussed with: tychon MFC after: 1 week
* Move the ACPI PM timer emulation into vmm.ko.neel2014-10-262-0/+146
| | | | | | | | | This reduces variability during timer calibration by keeping the emulation "close" to the guest. Additionally having all timer emulations in the kernel will ease the transition to a per-VM clock source (as opposed to using the host's uptime keep track of time). Discussed with: grehan
* IFC @r272481neel2014-10-051-13/+12
|\
| * Allow the PIC's IMR register to be read before ICW initialisation.grehan2014-09-271-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Reviewed by: tychon MFC after: 3 days
* | IFC @r272185neel2014-09-271-0/+1
|\ \ | |/
| * Add some more KTR events to help debugging.neel2014-09-201-0/+1
| |
* | IFC @r271694neel2014-09-171-9/+18
|\ \ | |/
| * Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannotneel2014-08-231-2/+8
| | | | | | | | | | | | | | | | find any unmasked pin with an interrupt asserted. Reviewed by: tychon CR: https://reviews.freebsd.org/D669 MFC after: 1 week
| * Reword comment to match the interrupt mode names from the MPtable spec.neel2014-08-141-7/+10
| | | | | | | | Reviewed by: tychon
* | Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to theneel2014-09-161-2/+6
|/ | | | | | | | | | | | | | | | | | processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)
* Add reserved bit checking when doing %CR8 emulation and inject #GP if required.neel2014-06-092-19/+42
| | | | | Pointed out by: grehan Reviewed by: tychon
* Support guest accesses to %cr8.tychon2014-06-062-3/+22
| | | | Reviewed by: neel
* Activate vcpus from bhyve(8) using the ioctl VM_ACTIVATE_CPU instead of doingneel2014-05-311-4/+0
| | | | | | | | | | | it implicitly in vmm.ko. Add ioctl VM_GET_CPUS to get the current set of 'active' and 'suspended' cpus and display them via /usr/sbin/bhyvectl using the "--get-active-cpus" and "--get-suspended-cpus" options. This is in preparation for being able to reset virtual machine state without having to destroy and recreate it.
* A Centos 6.4 guest will write 0xff to the 8259 mask register before beginningneel2014-05-231-0/+1
| | | | | | | | | | the proper ICWx initialization sequence. It assumes, probably correctly, that the boot firmware has done the 8259 initialization. Since grub-bhyve does not initialize the 8259 this write to the mask register takes a code path in which 'error' remains uninitialized (ready=0,icw_num=0). Fix this by initializing 'error' at the start of the function.
* Implement a PCI interrupt router to route PCI legacy INTx interrupts tojhb2014-05-152-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the legacy 8259A PICs. - Implement an ICH-comptabile PCI interrupt router on the lpc device with 8 steerable pins configured via config space access to byte-wide registers at 0x60-63 and 0x68-6b. - For each configured PCI INTx interrupt, route it to both an I/O APIC pin and a PCI interrupt router pin. When a PCI INTx interrupt is asserted, ensure that both pins are asserted. - Provide an initial routing of PCI interrupt router (PIRQ) pins to 8259A pins (ISA IRQs) and initialize the interrupt line config register for the corresponding PCI function with the ISA IRQ as this matches existing hardware. - Add a global _PIC method for OSPM to select the desired interrupt routing configuration. - Update the _PRT methods for PCI bridges to provide both APIC and legacy PRT tables and return the appropriate table based on the configured routing configuration. Note that if the lpc device is not configured, no routing information is provided. - When the lpc device is enabled, provide ACPI PCI link devices corresponding to each PIRQ pin. - Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A pins via the ELCR. - Mark the power management SCI as level triggered. - Don't hardcode the number of elements in Packages in the source for the DSDT. iasl(8) will fill in the actual number of elements, and this makes it simpler to generate a Package with a variable number of elements. Reviewed by: tycho
* Change the vlapic timer frequency to be in the ballpark of contemporaryneel2014-04-231-1/+6
| | | | | | | hardware. This also decouples the vlapic emulation from the host's TSC frequency. Requested by: grehan@
* Add support for the PIT 'readback' command -- based on a patch by grehan@.tychon2014-04-181-2/+74
| | | | Approved by: grehan (co-mentor)
* Respect the destination operand size of the 'Input from Port' instruction.tychon2014-04-184-47/+56
| | | | Approved by: grehan (co-mentor)
* Add support for reading the PIT Counter 2 output signal via the NMItychon2014-04-182-18/+59
| | | | | | | | | | | | Status and Control register at port 0x61. Be more conservative about "catching up" callouts that were supposed to fire in the past by skipping an interrupt if it was scheduled too far in the past. Restore the PIT ACPI DSDT entries and add an entry for NMISC too. Approved by: neel (co-mentor)
* Add support for emulating the slave PIC.tychon2014-04-141-65/+132
| | | | | Reviewed by: grehan, jhb Approved by: grehan (co-mentor)
* Rework r264179.grehan2014-04-101-1/+2
| | | | | | | | | | | | | - remove redundant code - remove erroneous setting of the error return in vmmdev_ioctl() - use style(9) initialization - in vmx_inject_pir(), document the race condition that the final conditional statement was detecting, Tested with both gcc and clang builds. Reviewed by: neel
* Make the vmm code compile with gcc too. Not entirely sure things areimp2014-04-052-23/+1
| | | | | | | correct for the pirbase test (since I'd have thought we'd need to do something even when the offset is 0 and that test looks like a misguided attempt to not use an uninitialized variable), but it is at least the same as today.
* Re-write bhyve's I/O MMU handling in terms of PCI RID.rstone2014-04-013-15/+16
| | | | | | Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc.
* Revert PCI RID changes.rstone2014-04-013-16/+15
| | | | | | | | My PCI RID changes somehow got intermixed with my PCI ARI patch when I committed it. I may have accidentally applied a patch to a non-clean working tree. Revert everything while I figure out what went wrong. Pointy hat to: rstone
* Re-write bhyve's I/O MMU handling in terms of PCI RIDsrstone2014-04-013-15/+16
| | | | | Reviewed by: neel Sponsored by: Sandvine Inc
* Move the atpit device model from userspace into vmm.ko for bettertychon2014-03-252-0/+410
| | | | | | precision and lower latency. Approved by: grehan (co-mentor)
* Fix a race wherein the source of an interrupt vector is wronglytychon2014-03-154-33/+36
| | | | | | | | | | | | | attributed if an ExtINT arrives during interrupt injection. Also, fix a spurious interrupt if the PIC tries to raise an interrupt before the outstanding one is accepted. Finally, improve the PIC interrupt latency when another interrupt is raised immediately after the outstanding one is accepted by creating a vmexit rather than waiting for one to occur by happenstance. Approved by: neel (co-mentor)
* Don't try to return a vector to a caller that only cares if a vectortychon2014-03-111-2/+6
| | | | | | is pending or not. Approved by: neel (co-mentor)
* Replace the userspace atpic stub with a more functional vmm.ko model.tychon2014-03-115-5/+734
| | | | | | | | New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ can be used to manipulate the pic, and optionally the ioapic, pin state. Reviewed by: jhb, neel Approved by: neel (co-mentor)
* Add support for x2APIC virtualization assist in Intel VT-x.neel2014-02-213-1/+10
| | | | | | | | | | | | | | | | | | The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode is switched to x2APIC. The VT-x implementation of this handler turns off the APIC-access virtualization and enables the x2APIC virtualization in the VMCS. The x2APIC virtualization is done by allowing guest read access to a subset of MSRs in the x2APIC range. In non-root operation the processor will satisfy an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead. The guest is also given write access to TPR, EOI and SELF_IPI MSRs which get special treatment in non-root operation. This is documented in the Intel SDM section titled "Virtualizing MSR-Based APIC Accesses". Enforce that APIC-write and APIC-access VM-exits are handled only if APIC-access virtualization is enabled. The one exception to this is SELF_IPI virtualization which may result in an APIC-write VM-exit.
* Simplify APIC mode switching from MMIO to x2APIC. In part this is done toneel2014-02-202-39/+72
| | | | | | | | | | | | | | | | | | | | | simplify the implementation of the x2APIC virtualization assist in VT-x. Prior to this change the vlapic allowed the guest to change its mode from xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked when the virtual machine is created. This is not very constraining because operating systems already have to deal with BIOS setting up the APIC in x2APIC mode at boot. Fix a bug in the CPUID emulation where the x2APIC capability was leaking from the host to the guest. Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore MSR accesses to the vlapic when it is in xAPIC mode. The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8) can be used to change the mode to x2APIC instead. Discussed with: grehan@
OpenPOWER on IntegriCloud