summaryrefslogtreecommitdiffstats
path: root/sys/amd64/vmm
Commit message (Collapse)AuthorAgeFilesLines
...
* | Fix bhyvectl so it works correctly on AMD/SVM hosts. Also, add command lineneel2014-10-102-0/+88
| | | | | | | | | | | | | | | | | | | | | | options to display some key VMCB fields. The set of valid options that can be passed to bhyvectl now depends on the processor type. AMD-specific options are identified by a "--vmcb" or "--avic" in the option name. Intel-specific options are identified by a "--vmcs" in the option name. Submitted by: Anish Gupta (akgupt3@gmail.com)
* | IFC @r272481neel2014-10-052-68/+29
|\ \ | |/
| * Get rid of code that dealt with the hardware not being able to save/restoreneel2014-10-021-55/+17
| | | | | | | | | | | | | | | | | | | | the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. Discussed with: grehan
| * Allow the PIC's IMR register to be read before ICW initialisation.grehan2014-09-271-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Reviewed by: tychon MFC after: 3 days
* | IFC @r272185neel2014-09-273-8/+8
|\ \ | |/
| * Add some more KTR events to help debugging.neel2014-09-202-1/+8
| |
| * MSR_KGSBASE is no longer saved and restored from the guest MSR save area. Thisneel2014-09-201-7/+0
| | | | | | | | | | | | | | | | behavior was changed in r271888 so update the comment block to reflect this. MSR_KGSBASE is accessible from the guest without triggering a VM-exit. The permission bitmap for MSR_KGSBASE is modified by vmx_msr_guest_init() so get rid of redundant code in vmx_vminit().
| * Restructure the MSR handling so it is entirely handled by processor-specificneel2014-09-209-371/+201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | code. There are only a handful of MSRs common between the two so there isn't too much duplicate functionality. The VT-x code has the following types of MSRs: - MSRs that are unconditionally saved/restored on every guest/host context switch (e.g., MSR_GSBASE). - MSRs that are restored to guest values on entry to vmx_run() and saved before returning. This is an optimization for MSRs that are not used in host kernel context (e.g., MSR_KGSBASE). - MSRs that are emulated and every access by the guest causes a trap into the hypervisor (e.g., MSR_IA32_MISC_ENABLE). Reviewed by: grehan
* | Simplify register state save and restore across a VMRUN:neel2014-09-274-145/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Host registers are now stored on the stack instead of a per-cpu host context. - Host %FS and %GS selectors are not saved and restored across VMRUN. - Restoring the %FS/%GS selectors was futile anyways since that only updates the low 32 bits of base address in the hidden descriptor state. - GS.base is properly updated via the MSR_GSBASE on return from svm_launch(). - FS.base is not used while inside the kernel so it can be safely ignored. - Add function prologue/epilogue so svm_launch() can be traced with Dtrace's FBT entry/exit probes. They also serve to save/restore the host %rbp across VMRUN. Reviewed by: grehan Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Allow more VMCB fields to be cached:neel2014-09-215-223/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | - CR2 - CR0, CR3, CR4 and EFER - GDT/IDT base/limit fields - CS/DS/ES/SS selector/base/limit/attrib fields The caching can be further restricted via the tunable 'hw.vmm.svm.vmcb_clean'. Restructure the code such that the fields above are only modified in a single place. This makes it easy to invalidate the VMCB cache when any of these fields is modified.
* | Get rid of unused stat VMM_HLT_IGNORED.neel2014-09-212-2/+0
| |
* | IFC r271888.neel2014-09-2012-404/+433
| | | | | | | | Restructure MSR emulation so it is all done in processor-specific code.
* | IFC @r271694neel2014-09-176-93/+337
|\ \ | |/
| * Optimize the common case of injecting an interrupt into a vcpu after a HLTneel2014-09-122-1/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | by explicitly moving it out of the interrupt shadow. The hypervisor is done "executing" the HLT and by definition this moves the vcpu out of the 1-instruction interrupt shadow. Prior to this change the interrupt would be held pending because the VMCS guest-interruptibility-state would indicate that "blocking by STI" was in effect. This resulted in an unnecessary round trip into the guest before the pending interrupt could be injected. Reviewed by: grehan
| * The "SUB" instruction used in getcc() actually does 'x -= y' so use theneel2014-08-301-42/+66
| | | | | | | | | | | | | | | | | | | | | | | | proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. Reviewed by: grehan
| * Implement the 0x2B SUB instruction, and the OR variant of 0x81.grehan2014-08-271-13/+91
| | | | | | | | | | | | | | Found with local APIC accesses from bitrig/amd64 bsd.rd, 07/15-snap. Reviewed by: neel MFC after: 3 days
| * Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package"neel2014-08-241-24/+77
| | | | | | | | | | | | | | | | | | | | tunables to modify the default cpu topology advertised by bhyve. Also add a tunable "hw.vmm.topology.cpuid_leaf_b" to disable the CPUID leaf 0xb. This is intended for testing guest behavior when it falls back on using CPUID leaf 0x4 to deduce CPU topology. The default behavior is to advertise each vcpu as a core in a separate soket.
| * Fix a bug in the emulation of CPUID leaf 0x4 where bhyve was claiming thatneel2014-08-231-2/+2
| | | | | | | | | | | | | | the vcpu had no caches at all. This causes problems when executing applications in the guest compiled with the Intel compiler. Submitted by: Mark Hill (mark.hill@tidalscale.com)
| * Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannotneel2014-08-231-2/+8
| | | | | | | | | | | | | | | | find any unmasked pin with an interrupt asserted. Reviewed by: tychon CR: https://reviews.freebsd.org/D669 MFC after: 1 week
| * Reword comment to match the interrupt mode names from the MPtable spec.neel2014-08-141-7/+10
| | | | | | | | Reviewed by: tychon
* | Rework vNMI injection.neel2014-09-171-7/+73
| | | | | | | | | | | | | | | | | | | | | | | | Keep track of NMI blocking by enabling the IRET intercept on a successful vNMI injection. The NMI blocking condition is cleared when the handler executes an IRET and traps back into the hypervisor. Don't inject NMI if the processor is in an interrupt shadow to preserve the atomic nature of "STI;HLT". Take advantage of this and artificially set the interrupt shadow to prevent NMI injection when restarting the "iret". Reviewed by: Anish Gupta (akgupt3@gmail.com), grehan
* | Minor cleanup.neel2014-09-162-15/+1
| | | | | | | | | | | | | | | | | | Get rid of unused 'svm_feature' from the softc. Get rid of the redundant 'vcpu_cnt' checks in svm.c. There is a similar check in vmm.c against 'vm->active_cpus' before the AMD-specific code is called. Submitted by: Anish Gupta (akgupt3@gmail.com)
* | Use V_IRQ, V_INTR_VECTOR and V_TPR to offload APIC interrupt delivery to theneel2014-09-162-46/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | processor. Briefly, the hypervisor sets V_INTR_VECTOR to the APIC vector and sets V_IRQ to 1 to indicate a pending interrupt. The hardware then takes care of injecting this vector when the guest is able to receive it. Legacy PIC interrupts are still delivered via the event injection mechanism. This is because the vector injected by the PIC must reflect the state of its pins at the time the CPU is ready to accept the interrupt. Accesses to the TPR via %CR8 are handled entirely in hardware. This requires that the emulated TPR must be synced to V_TPR after a #VMEXIT. The guest can also modify the TPR via the memory mapped APIC. This requires that the V_TPR must be synced with the emulated TPR before a VMRUN. Reviewed by: Anish Gupta (akgupt3@gmail.com)
* | Set the 'vmexit->inst_length' field properly depending on the type of theneel2014-09-141-127/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VM-exit and ultimately on whether nRIP is valid. This allows us to update the %rip after the emulation is finished so any exceptions triggered during the emulation will point to the right instruction. Don't attempt to handle INS/OUTS VM-exits unless the DecodeAssist capability is available. The effective segment field in EXITINFO1 is not valid without this capability. Add VM_EXITCODE_SVM to flag SVM VM-exits that cannot be handled. Provide the VMCB fields exitinfo1 and exitinfo2 as collateral to help with debugging. Provide a SVM VM-exit handler to dump the exitcode, exitinfo1 and exitinfo2 fields in bhyve(8). Reviewed by: Anish Gupta (akgupt3@gmail.com) Reviewed by: grehan
* | Bug fixes.neel2014-09-132-1/+6
| | | | | | | | | | | | | | | | | | | | - Don't enable the HLT intercept by default. It will be enabled by bhyve(8) if required. Prior to this change HLT exiting was always enabled making the "-H" option to bhyve(8) meaningless. - Recognize a VM exit triggered by a non-maskable interrupt. Prior to this change the exit would be punted to userspace and the virtual machine would terminate.
* | style(9): insert an empty line if the function has no local variablesneel2014-09-131-0/+2
| | | | | | | | Pointed out by: grehan
* | AMD processors that have the SVM decode assist capability will store theneel2014-09-135-20/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instruction bytes in the VMCB on a nested page fault. This is useful because it saves having to walk the guest page tables to fetch the instruction. vie_init() now takes two additional parameters 'inst_bytes' and 'inst_len' that map directly to 'vie->inst[]' and 'vie->num_valid'. The instruction emulation handler skips calling 'vmm_fetch_instruction()' if 'vie->num_valid' is non-zero. The use of this capability can be turned off by setting the sysctl/tunable 'hw.vmm.svm.disable_npf_assist' to '1'. Reviewed by: Anish Gupta (akgupt3@gmail.com) Discussed with: grehan
* | style(9): indent the switch, don't indent the case, indent case body one tab.neel2014-09-111-152/+132
| |
* | Repurpose the V_IRQ interrupt injection to implement VMX-style interruptneel2014-09-111-71/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | window exiting. This simply involves setting V_IRQ and enabling the VINTR intercept. This instructs the CPU to trap back into the hypervisor as soon as an interrupt can be injected into the guest. The pending interrupt is then injected via the traditional event injection mechanism. Rework vcpu interrupt injection so that Linux guests now idle with host cpu utilization close to 0%. Reviewed by: Anish Gupta (earlier version) Discussed with: grehan
* | Allow intercepts and irq fields to be cached by the VMCB.neel2014-09-102-117/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide APIs svm_enable_intercept()/svm_disable_intercept() to add/delete VMCB intercepts. These APIs ensure that the VMCB state cache is invalidated when intercepts are modified. Each intercept is identified as a (index,bitmask) tuple. For e.g., the VINTR intercept is identified as (VMCB_CTRL1_INTCPT,VMCB_INTCPT_VINTR). The first 20 bytes in control area that are used to enable intercepts are represented as 'uint32_t intercept[5]' in 'struct vmcb_ctrl'. Modify svm_setcap() and svm_getcap() to use the new APIs. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Move the VMCB initialization into svm.c in preparation for changes to theneel2014-09-103-84/+79
| | | | | | | | | | | | interrupt injection logic. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Move the event injection function into svm.c and add KTR logging forneel2014-09-103-41/+66
| | | | | | | | | | | | | | | | every event injection. This in in preparation for changes to SVM guest interrupt injection. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Remove a bogus check that flagged an error if the guest %rip was zero.neel2014-09-101-5/+0
| | | | | | | | | | | | An AP begins execution with %rip set to 0 after a startup IPI. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Make the KTR tracepoints uniform and ensure that every VM-exit is logged.neel2014-09-103-50/+61
| | | | | | | | Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Allow guest read access to MSR_EFER without hypervisor intervention.neel2014-09-101-19/+24
| | | | | | | | Dirty the VMCB_CACHE_CR state cache when MSR_EFER is modified.
* | Remove gratuitous forward declarations.neel2014-09-091-16/+12
| | | | | | | | Remove tabs on empty lines.
* | Do proper ASID management for guest vcpus.neel2014-09-064-84/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change an ASID was hard allocated to a guest and shared by all its vcpus. The meant that the number of VMs that could be created was limited to the number of ASIDs supported by the CPU. It was also inefficient because it forced a TLB flush on every VMRUN. With this change the number of guests that can be created is independent of the number of available ASIDs. Also, the TLB is flushed only when a new ASID is allocated. Discussed with: grehan Reviewed by: Anish Gupta (akgupt3@gmail.com)
* | Merge svm_set_vmcb() and svm_init_vmcb() into a single function that is calledneel2014-09-053-71/+33
| | | | | | | | | | | | just once when a vcpu is initialized. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Remove unused header file.neel2014-09-041-49/+0
| | | | | | | | Discussed with: Anish Gupta (akgupt3@gmail.com)
* | Consolidate the code to restore the host TSS after a #VMEXIT into a singleneel2014-09-041-29/+23
| | | | | | | | | | | | | | | | | | function restore_host_tss(). Don't bother to restore MSR_KGSBASE after a #VMEXIT since it is not used in the kernel. It will be restored on return to userspace. Discussed with: Anish Gupta (akgupt3@gmail.com)
* | IFC @r269962neel2014-09-0217-571/+1878
|\ \ | |/ | | | | Submitted by: Anish Gupta (akgupt3@gmail.com)
| * Use the max guest memory address when creating its iommu domain.neel2014-08-142-1/+21
| | | | | | | | | | | | | | Also, assert that the GPA being mapped in the domain is less than its maxaddr. Reviewed by: grehan Pointed out by: Anish Gupta (akgupt3@gmail.com)
| * Support PCI extended config space in bhyve.neel2014-08-081-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ACPI MCFG table to advertise the extended config memory window. Introduce a new flag MEM_F_IMMUTABLE for memory ranges that cannot be deleted or moved in the guest's address space. The PCI extended config space is an example of an immutable memory range. Add emulation for the "movzw" instruction. This instruction is used by FreeBSD to read a 16-bit extended config space register. CR: https://phabric.freebsd.org/D505 Reviewed by: jhb, grehan Requested by: tychon
| * - Output a summary of optional VT-x features in dmesg similar to CPUjhb2014-07-303-30/+27
| | | | | | | | | | | | | | | | | | | | | | | | features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed. - Add read-only sysctls for optional VT-x capabilities used by bhyve under a new hw.vmm.vmx.cap node. Move a few exiting sysctls that indicate the presence of optional capabilities under this node. CR: https://phabric.freebsd.org/D498 Reviewed by: grehan, neel MFC after: 1 week
| * If a vcpu has issued a HLT instruction with interrupts disabled then it sleepsneel2014-07-262-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | forever in vm_handle_hlt(). This is usually not an issue as long as one of the other vcpus properly resets or powers off the virtual machine. However, if the bhyve(8) process is killed with a signal the halted vcpu cannot be woken up because it's sleep cannot be interrupted. Fix this by waking up periodically and returning from vm_handle_hlt() if TDF_ASTPENDING is set. Reported by: Leon Dang Sponsored by: Nahanni Systems
| * Don't return -1 from the push emulation handler. Negative return values areneel2014-07-261-4/+11
| | | | | | | | | | interpreted specially on return from sys_ioctl() and may cause undesirable side-effects like restarting the system call.
| * Fix a couple of issues in the PUSH emulation:neel2014-07-241-5/+15
| | | | | | | | | | | | | | | | | | | | It is not possible to PUSH a 32-bit operand on the stack in 64-bit mode. The default operand size for PUSH is 64-bits and the operand size override prefix changes that to 16-bits. vm_copy_setup() can return '1' if it encounters a fault when walking the guest page tables. This is a guest issue and is now handled properly by resuming the guest to handle the fault.
| * Fix fault injection in bhyve.neel2014-07-241-57/+15
| | | | | | | | | | | | | | | | | | | | | | The faulting instruction needs to be restarted when the exception handler is done handling the fault. bhyve now does this correctly by setting 'vmexit[vcpu].inst_length' to zero so the %rip is not advanced. A minor complication is that the fault injection APIs are used by instruction emulation code that is shared by vmm.ko and bhyve. Thus the argument that refers to 'struct vm *' in kernel or 'struct vmctx *' in userspace needs to be loosely typed as a 'void *'.
| * Emulate instructions emitted by OpenBSD/i386 version 5.5:neel2014-07-232-60/+415
| | | | | | | | | | | | | | - CMP REG, r/m - MOV AX/EAX/RAX, moffset - MOV moffset, AX/EAX/RAX - PUSH r/m
| * Fix build without INVARIANTS defined by getting rid of unused variable 'exc'.neel2014-07-201-2/+1
| | | | | | | | Reported by: adrian, stefanf
OpenPOWER on IntegriCloud