summaryrefslogtreecommitdiffstats
path: root/sys/amd64
Commit message (Collapse)AuthorAgeFilesLines
* MFC r282680:kib2015-05-121-2/+0
| | | | Remove unused define.
* MFC r281762:kib2015-04-271-9/+0
| | | | | Remove duplicate definitions of MWAIT_CX hints. Identical defines in specialreg.h are enough.
* MFC 278325,280866:jhb2015-04-151-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that 278325: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. 280866: Wait 100 microseconds for a local APIC to dispatch each startup-related IPI rather than 20. The MP 1.4 specification states in Appendix B.2: "A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions". (Note that this appears to be separate from the 10 millisecond (INIT) and 200 microsecond (STARTUP) waits after the IPIs are dispatched.) The Intel SDM is silent on this issue as far as I can tell. At least some hardware requires 60 microseconds as noted in the PR, so bump this to 100 to be on the safe side. PR: 196542, 197756
* MFC 276724:jhb2015-04-021-1/+1
| | | | | | | | | | On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316
* MFC 261790:jhb2015-04-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386. PR: 197076
* MFC r280781:kib2015-03-311-0/+1
| | | | | | Make it possible for the signal handler to act on #ss. Load the canonical user data segment' selector into %ss when calling the handler.
* MFC r280780:kib2015-03-311-2/+0
| | | | | | The #ss fault handler erronously does not check for the fault originated from the return to usermode. #ss must be handled same as #np.
* Revert accidental(?) change in r280455 and do not compile hwpmc staticallyjhb2015-03-301-1/+0
| | | | | into GENERIC by default. This change is not present in HEAD and was not made in the two commits to HEAD that r280455 merged.
* MFC r280134:mav2015-03-301-0/+6
| | | | | | | | | | | | | Report ARAT (APIC-Timer-always-running) feature for virtual CPU. This makes FreeBSD guest to not avoid using LAPIC timer, preferring HPET due to worries about non-existing for virtual CPUs deep sleep states. Benchmarks of usleep(1) on guest and host show such extra latencies: - 51us for virtual HPET, - 22us for virtual LAPIC timer, - 22us for host HPET and - 3us for host LAPIC timer.
* MFC of r277177 and r279894 with the fixes for the PMC for Haswell.rrs2015-03-241-0/+1
| | | | Sponsored by: Netflix Inc.
* MFC r278655:markj2015-03-191-1/+21
| | | | Add support for decoding multibyte NOPs.
* Merge r263233 from HEAD to stable/10:rwatson2015-03-192-2/+2
| | | | | | | | | Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.
* MFC 277713:jhb2015-03-121-0/+16
| | | | | | | | If the boot-time memory test is enabled, output a dot ('.') for each GB of RAM tested so people watching the console can see that the machine is making progress and not hung. PR: 196650
* MFC r264007,r264008,r264009,r264011,r264012,r264013rstone2015-03-015-33/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC support for PCI Alternate RID Interpretation. ARI is an optional PCIe feature that allows PCI devices to present up to 256 functions on a bus. This is effectively a prerequisite for PCI SR-IOV support. r264007: Add a method to get the PCI RID for a device. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264008: Re-implement the DMAR I/O MMU code in terms of PCI RIDs Under the hood the VT-d spec is really implemented in terms of PCI RIDs instead of bus/slot/function, even though the spec makes pains to convert back to bus/slot/function in examples. However working with bus/slot/function is not correct when PCI ARI is in use, so convert to using RIDs in most cases. bus/slot/function will only be used when reporting errors to a user. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264009: Re-write bhyve's I/O MMU handling in terms of PCI RID. Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc. r264011: Add support for PCIe ARI PCIe Alternate RID Interpretation (ARI) is an optional feature that allows devices to have up to 256 different functions. It is implemented by always setting the PCI slot number to 0 and re-purposing the 5 bits used to encode the slot number to instead contain the function number. Combined with the original 3 bits allocated for the function number, this allows for 256 functions. This is enabled by default, but it's expected to be a no-op on currently supported hardware. It's a prerequisite for supporting PCI SR-IOV, and I want the ARI support to go in early to help shake out any bugs in it. ARI can be disabled by setting the tunable hw.pci.enable_ari=0. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264012: Print status of ARI capability in pciconf -c Teach pciconf how to print out the status (enabled/disabled) of the ARI capability on PCI Root Complexes and Downstream Ports. MFC after: 2 months Sponsored by: Sandvine Inc. r264013: Add missing copyright date. MFC after: 2 months
* MFC 274817,274878,276801,276840,278976:jhb2015-02-233-16/+72
| | | | | | | | | | | | | | | | Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0).
* MFC 273800:jhb2015-02-101-0/+2
| | | | | | | | | | | | | Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare.
* MFC r278001:kib2015-02-072-5/+5
| | | | Do not qualify the mcontext_t *mcp argument for set_mcontext(9) as const.
* MFC r277055:kib2015-01-192-9/+1
| | | | Revert r263475: TDP_DEVMEMIO no longer needed.
* MFC r277051:kib2015-01-191-47/+55
| | | | Fix several issues with /dev/mem and /dev/kmem devices on amd64.
* MFC r277047:kib2015-01-192-0/+2
| | | | For x86, read MAXPHYADDR into variable cpu_maxphyaddr.
* MFC r265329:nwhitehorn2015-01-111-0/+2
| | | | | | | | | Disable ACPI and P4TCC throttling by default, following discussion on freebsd-current. These CPU speed control techniques are usually unhelpful at best. For now, continue building the relevant code into GENERIC so that it can trivially be re-enabled at runtime if anyone wants it. Relnotes: yes
* MFC r276523:kib2015-01-091-7/+7
| | | | Restore access to the page at zero through /dev/mem after r263475.
* MFC r276522:kib2015-01-091-4/+1
| | | | | Actually remove GIANT_REQUIRED, declared but not done in r263475. Style.
* Regen for r276810.dchagin2015-01-085-7/+7
|
* MFC r276508, r276509:dchagin2015-01-081-1/+1
| | | | Correct an argument status of wait4 syscall for Linuxulator.
* MFC r276322:kib2015-01-032-57/+25
| | | | | | Change the way the lcall $7,$0 is reflected to usermode. Instead of setting call gate, which must be 64 bit, put a code segment descriptor into ldt slot 0.
* MFC r270961alc2015-01-021-1/+1
| | | | Update a comment to reflect the changes in r213408.
* MFC r273701, r274556alc2015-01-021-2/+11
| | | | | | | | | | | By the time that pmap_init() runs, vm_phys_segs[] has been initialized. Obtaining the end of memory address from vm_phys_segs[] is a little easier than obtaining it from phys_avail[]. Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the default on i386 PAE. (The use of VM_PHYSSEG_SPARSE on i386 PAE saves us some precious kernel virtual address space that would have been wasted on unused vm_page structures.)
* MFC r276323neel2014-12-311-4/+25
| | | | Implement "special mask mode" in vatpic.
* MFC r273683neel2014-12-3011-39/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the ACPI PM timer emulation into vmm.ko. MFC r273706 Change the type of the first argument to the I/O emulation handlers to 'struct vm *'. MFC r273710 Add a comment explaining the intent behind the I/O reservation [0x72-0x77]. MFC r273744 Add foo_genassym.c files to DPSRCS so dependencies for them are generated. This ensures these objects are rebuilt to generate an updated header of assembly constants if needed. MFC r274045 If the start bit, PxCMD.ST, is cleared and nothing is in-flight then PxCI, PxSACT, PxCMD.CCS and PxCMD.CR should be 0. MFC r274076 Improve the ability to cancel an in-flight request by using an interrupt, via SIGCONT, to force the read or write system call to return prematurely. MFC r274330 To allow a request to be submitted from within the callback routine of a completing one increase the total by 1 but don't advertise it. MFC r274931 Change the lower bound for guest vmspace allocation to 0 instead of using the VM_MIN_ADDRESS constant. MFC r275817 For level triggered interrupts clear the PIC IRR bit when the interrupt pin is deasserted. MFC r275850 Fix 8259 IRQ priority resolver. MFC r275952 Various 8259 device model improvements. MFC r275965 Emulate writes to the IA32_MISC_ENABLE MSR.
* MFC r273375neel2014-12-3021-175/+3782
| | | | | | | | | | | | | | | | | | | | | | Add support AMD processors with the SVM/AMD-V hardware extensions. MFC r273749 Remove bhyve SVM feature printf's now that they are available in the general CPU feature detection code. MFC r273766 Add missing 'break' pointed out by Coverity CID 1249760. MFC r276098 Allow ktr(4) tracing of all guest exceptions via the tunable "hw.vmm.trace_guest_exceptions" MFC r276392 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT' on an AMD/SVM host. MFC r276402 Remove "svn:mergeinfo" property that was dragged along when these files were svn copied in r273375.
* MFC 261321neel2014-12-302-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't redefined. MFC r273214 Fix build to not bogusly always rebuild vmm.ko. MFC r273338 Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI. Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'. Add defines for various AMD-specific MSRs. Discussed with: kib (r261321)
* MFC r270326neel2014-12-2814-525/+633
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a recursive lock acquisition in vi_reset_dev(). MFC r270434 Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted. MFC r270436 Fix a bug in the emulation of CPUID leaf 0x4. MFC r270437 Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve. MFC r270855 Set the 'inst_length' to '0' early on before any error conditions are detected in the emulation of the task switch. If any exceptions are triggered then the guest %rip should point to instruction that caused the task switch as opposed to the one after it. MFC r270857 The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. MFC r271439 Initialize 'bc_rdonly' to the right value. MFC r271451 Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. MFC r271888 Restructure the MSR handling so it is entirely handled by processor-specific code. MFC r271890 MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this. MFC r271891 Add some more KTR events to help debugging. MFC r272197 mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. MFC r272395 Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. MFC r272670 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. MFC r272710 Implement the FLUSH operation in the virtio-block emulation. MFC r272838 iasl(8) expects integer fields in data tables to be specified as hexadecimal values. Therefore the bit width of the "PM Timer Block" was actually being interpreted as 50-bits instead of the expected 32-bit. This eliminates an error message emitted by a Linux 3.17 guest during boot: "Invalid length for FADT/PmTimerBlock: 50, using default 32" MFC r272839 Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT MFC r273108 Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve. MFC r273212 Support stopping and restarting the AHCI command list via toggling PxCMD.ST from '1' to '0' and back. This allows the driver a chance to recover if for instance a timeout occurred due to activity on the host.
* MFC 273988,273989,273995,274057:jhb2014-12-223-4/+2
| | | | | MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386.
* MFC 271405,271408,271409,272658:jhb2014-12-221-1/+1
| | | | | | MFamd64: Use initializecpu() to set various model-specific registers on AP startup and AP resume (it was already used for BSP startup and BSP resume).
* MFC 260557,271076,271077,271082,271083,271098:jhb2014-12-223-927/+3
| | | | | | | | | | - Remove spaces from boot messages when we print the CPU ID/Family/Stepping - Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Reduce diffs between i386 and amd64 initcpu.c and identcpu.c files. - Move blacklists of broken TSCs out of the printcpuinfo() function and into the TSC probe routine. - Merge the amd64 and i386 identcpu.c into a single x86 implementation.
* MFC r275833:kib2014-12-191-2/+7
| | | | | | | The iret instruction may generate #np and #ss fault, besides #gp. When returning to usermode, the handler for that exceptions is also executed with wrong gs base. Handle all three possible faults in the same way, checking for iret fault, and performing full iret.
* MFC r273515, r274055, r274063, r274215, r274065, r274502:bryanv2014-11-291-0/+1
| | | | Add VirtIO console driver.
* MFC r274555:kib2014-11-221-2/+2
| | | | | Fix END()s for fueword and fueword64, match the name in END() with entry.
* MFC r274489:scottl2014-11-222-0/+61
| | | | | | Add frame pointers to ASM functions in support.S Obtained from: Netflix
* Merge the fueword(9) and casueword(9). In particular,kib2014-11-182-35/+54
| | | | | | | | | | | | | | | | | | | | | | | MFC r273783: Add fueword(9) and casueword(9) functions. MFC note: ia64 is handled like arm, with NO_FUEWORD define. MFC r273784: Replace some calls to fuword() by fueword() with proper error checking. MFC r273785: Convert kern_umtx.c to use fueword() and casueword(). MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not converted, they are removed from HEAD, and not used. The do_sem2*() family is not yet merged to stable/10, corresponding chunk will be merged after do_sem2* are committed. MFC r273788 (by jkim): Actually install casuword(9) to fix build. MFC r273911: Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9).
* MFC r273666.neel2014-10-291-21/+27
| | | | Don't pass the 'error' return from an I/O port handler directly to vm_run().
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-1/+1
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* MFC r273356:neel2014-10-241-2/+12
| | | | | Fix a race in pmap_emulate_accessed_dirty() that could trigger a EPT misconfiguration VM-exit.
* MFC r272761:kib2014-10-152-10/+15
| | | | | | | | | Add an argument to the x86 pmap_invalidate_cache_range() to request forced invalidation of the cache range regardless of the presence of self-snoop feature. MFC r272943: MFi386 r272761.
* MFC 270828,271487,271495:jhb2014-10-102-0/+64
| | | | | Add sysctls to export the BIOS SMAP and EFI memory maps along with handlers in the sysctl(8) binary to format them.
* MFC r271747:kib2014-10-042-17/+21
| | | | | | | | | | | - Use NULL instead of 0 for fpcurthread. - Note the quirk with the interrupt enabled state of the dna handler. - Use just panic() instead of printf() and panic(). Print tid instead of pid, the fpu state is per-thread. MFC r271924: Update and clarify comments. Remove the useless counter for impossible, but seen in wild situation (on buggy hypervisors).
* MFC r272193grehan2014-10-011-13/+12
| | | | | | | | | | | | | | | | | Allow the PIC's IMR register to be read before ICW initialisation. As of git submit e179f6914152eca9, the Linux kernel does a simple probe of the PIC by writing a pattern to the IMR and then reading it back, prior to the init sequence of ICW words. The bhyve PIC emulation wasn't allowing the IMR to be read until the ICW sequence was complete. This limitation isn't required so relax the test. With this change, Linux kernels 3.15-rc2 and later won't hang on boot when calibrating the local APIC. Approved by: re (gjb)
* MFC 271745,271834,271899,271900,271913,272022,272023:bz2014-09-302-0/+6
| | | | | | | | | | | | | | | | Revert changes to shared code of the ixl and ixlv drivers to allow for easier long-term maintainability. Restrict the drivers to building on amd64 for now as it is only tested on that 64bit architecture. Just depend on PCI and neither INET nor INET6; also make sure we can build individual drivers and they do not depend on each other anymore. Reviewed by: gnn, eric.joyner intel.com PR: 193824 Approved by: re (gjb)
* This is a direct commit rather than an MFC of r271744.bz2014-09-235-19/+109
| | | | | | | | Re-gen after r272020 (r271743 in head) implementing most of timer_{create,settime,gettime,getoverrun,delete}. Approved by: re (gjb) Sponsored by: DARPA/AFRL
OpenPOWER on IntegriCloud