summaryrefslogtreecommitdiffstats
path: root/sys/x86
Commit message (Collapse)AuthorAgeFilesLines
* MFC r283735:kib2015-06-052-9/+3
| | | | Remove several write-only variables.
* MFC r283692:kib2015-06-051-0/+2
| | | | Explicitely enable queued invalidation completion interrupt.
* MFC 281887:jhb2015-06-024-4/+4
| | | | | Reassign copyright statements on several files from Advanced Computing Technologies LLC to Hudson River Trading LLC.
* MFC 281266:jhb2015-06-021-0/+1
| | | | | | | | | | | | Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h> and export them to userland. - Define __HAVE_REG32 on platforms that define a reg32 structure and check for this in <sys/procfs.h> to control when to export prstatus32, etc. - Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures. libbfd looks for these types, and having them fixes 'gcore' in gdb of a 32-bit process on a 64-bit platform. - Use the structure definitions from <sys/procfs.h> in gcore's elf32 core dump code instead of duplicating the definitions.
* MFC r282120:hselasky2015-05-051-2/+2
| | | | | | | | | | | | The add_bounce_page() function can be called when loading physical pages which pass a NULL virtual address. If the BUS_DMA_KEEP_PG_OFFSET flag is set, use the physical address to compute the page offset instead. The physical address should always be valid when adding bounce pages and should contain the same page offset like the virtual address. Submitted by: Svatopluk Kraus <onwahe@gmail.com> Reviewed by: jhb@
* MFC r281495:kib2015-04-271-1/+1
| | | | | | | | | | | | Add config option PAE_TABLES for the i386 kernel. It switches pmap to use PAE format for the page tables, but does not incur other consequences of the full PAE config. In particular, vm_paddr_t and bus_addr_t are left 32bit, and max supported memory is still limited by 4GB. The option allows to have nx permissions for memory mappings on i386 kernel, while keeping the usual i386 KBI and avoiding the kernel data sizing problems typical for the PAE config.
* MFC: r281396, r281475jkim2015-04-181-1/+1
| | | | | | Merge ACPICA 20150410. Relnotes: yes
* MFC 278325,280866:jhb2015-04-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that 278325: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. 280866: Wait 100 microseconds for a local APIC to dispatch each startup-related IPI rather than 20. The MP 1.4 specification states in Appendix B.2: "A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions". (Note that this appears to be separate from the 10 millisecond (INIT) and 200 microsecond (STARTUP) waits after the IPIs are dispatched.) The Intel SDM is silent on this issue as far as I can tell. At least some hardware requires 60 microseconds as noted in the PR, so bump this to 100 to be on the safe side. PR: 196542, 197756
* MFC r281254:kib2015-04-153-21/+32
| | | | | Account for the offset of the page run when allocating the dmar_map_entry.
* MFC 276724:jhb2015-04-021-6/+6
| | | | | | | | | | On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316
* MFC 261790:jhb2015-04-014-3/+72
| | | | | | | | | | | | | | | | | | | | | | Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge I/O windows, the default is to preserve the firmware-assigned resources. PCI bus numbers are only managed if NEW_PCIB is enabled and the architecture defines a PCI_RES_BUS resource type. - Add a helper API to create top-level PCI bus resource managers for each PCI domain/segment. Host-PCI bridge drivers use this API to allocate bus numbers from their associated domain. - Change the PCI bus and CardBus drivers to allocate a bus resource for their bus number from the parent PCI bridge device. - Change the PCI-PCI and PCI-CardBus bridge drivers to allocate the full range of bus numbers from secbus to subbus from their parent bridge. The drivers also always program their primary bus register. The bridge drivers also support growing their bus range by extending the bus resource and updating subbus to match the larger range. - Add support for managing PCI bus resources to the Host-PCI bridge drivers used for amd64 and i386 (acpi_pcib, mptable_pcib, legacy_pcib, and qpi_pcib). - Define a PCI_RES_BUS resource type for amd64 and i386. PR: 197076
* MFC 260973:jhb2015-04-014-67/+14
| | | | | | - Reuse legacy_pcib_(read|write)_config() methods in the QPI pcib driver. - Reuse legacy_pcib_alloc_msi{,x}() methods in the QPI and mptable pcib drivers.
* MFC r280435:kib2015-03-311-1/+1
| | | | | | When mapping an allocated entry, use the entry size, instead of the requested size. If tag restrictions caused split entry, its size is less then requsted.
* MFC r280434:kib2015-03-311-0/+1
| | | | Assert that the mapping loop makes progress.
* MFC r280254:kib2015-03-261-6/+21
| | | | | Provide definitions for all descriptors types in the DMAR invalidation queue.
* MFC r280196:kib2015-03-241-2/+4
| | | | | Recheck that boundary is not crossed after the move to satisfy boundary restriction.
* MFC r280195:kib2015-03-241-1/+2
| | | | | | When inserting new entry into the address map, ensure that not only next entry does not intersect with the tail of the new entry, but also that previous entry is also before new entry start.
* MFC r280253:kib2015-03-221-1/+1
| | | | Fix syntax error.
* MFC r271889, 272799, 272800, 274976scottl2015-03-121-8/+28
| | | | | | | | This brings in bus_get_domain() and the related reporting via devinfo, dmesg, and sysctl. Obtained from: adrian, jhb Sponsored by: Netflix, Inc.
* MFC r276949:kib2015-03-011-26/+48
| | | | | | | | (only to ease merging of r279117). MFC r279117: Revert r276949 and redo the fix for PCIe/PCI bridges, which do not follow specification and do not provide PCIe capability.
* MFC r276948:kib2015-03-012-4/+5
| | | | | Print rid when announcing DMAR context creation. Print sid when fault occurs.
* MFC r276867:kib2015-03-011-1/+1
| | | | | | | Fix DMAR context allocations for the devices behind PCIe->PCI bridges after dmar driver was converted to use rids. The bus component to calculate context page must be taken from the requestor rid, which is a bridge, and not from the device bus number.
* MFC r264007,r264008,r264009,r264011,r264012,r264013rstone2015-03-016-47/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC support for PCI Alternate RID Interpretation. ARI is an optional PCIe feature that allows PCI devices to present up to 256 functions on a bus. This is effectively a prerequisite for PCI SR-IOV support. r264007: Add a method to get the PCI RID for a device. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264008: Re-implement the DMAR I/O MMU code in terms of PCI RIDs Under the hood the VT-d spec is really implemented in terms of PCI RIDs instead of bus/slot/function, even though the spec makes pains to convert back to bus/slot/function in examples. However working with bus/slot/function is not correct when PCI ARI is in use, so convert to using RIDs in most cases. bus/slot/function will only be used when reporting errors to a user. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264009: Re-write bhyve's I/O MMU handling in terms of PCI RID. Reviewed by: neel MFC after: 2 months Sponsored by: Sandvine Inc. r264011: Add support for PCIe ARI PCIe Alternate RID Interpretation (ARI) is an optional feature that allows devices to have up to 256 different functions. It is implemented by always setting the PCI slot number to 0 and re-purposing the 5 bits used to encode the slot number to instead contain the function number. Combined with the original 3 bits allocated for the function number, this allows for 256 functions. This is enabled by default, but it's expected to be a no-op on currently supported hardware. It's a prerequisite for supporting PCI SR-IOV, and I want the ARI support to go in early to help shake out any bugs in it. ARI can be disabled by setting the tunable hw.pci.enable_ari=0. Reviewed by: kib MFC after: 2 months Sponsored by: Sandvine Inc. r264012: Print status of ARI capability in pciconf -c Teach pciconf how to print out the status (enabled/disabled) of the ARI capability on PCI Root Complexes and Downstream Ports. MFC after: 2 months Sponsored by: Sandvine Inc. r264013: Add missing copyright date. MFC after: 2 months
* MFC 274817,274878,276801,276840,278976:jhb2015-02-232-12/+22
| | | | | | | | | | | | | | | | Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0).
* MFC r278606:kib2015-02-182-4/+67
| | | | Registers definitions for the new capabilities.
* MFC r278605:kib2015-02-181-4/+2
| | | | vm_page_lookup() accepts read-locked object.
* MFC 273800:jhb2015-02-103-74/+174
| | | | | | | | | | | | | Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare.
* MFC 272666: Fix build for i386 kernels with out 'I686_CPU'.jhb2015-01-211-1/+1
| | | | Reported by: Mike Tancsa <mike@sentex.net>
* MFC r277047:kib2015-01-191-0/+3
| | | | For x86, read MAXPHYADDR into variable cpu_maxphyaddr.
* MFC r277023:kib2015-01-184-21/+54
| | | | | Avoid excessive flushing and do missed neccessary flushing in the IOMMU page table update code.
* MFC r273748neel2014-12-311-0/+65
| | | | | | | | Output a summary of optional SVM features in dmesg similar to CPU features. If bootverbose is enabled, a detailed list is provided; otherwise, a single-line summary is displayed. Requested by: jhb
* MFC 261321neel2014-12-301-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't redefined. MFC r273214 Fix build to not bogusly always rebuild vmm.ko. MFC r273338 Add support for AMD's nested page tables in pmap.c: - Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit) for a pmap of type PT_RVI. - Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of type PT_EPT or PT_RVI. Add CPU_SET_ATOMIC_ACQ(num, cpuset): This is used when activating a vcpu in the nested pmap. Using the 'acquire' variant guarantees that the load of the 'pm_eptgen' will happen only after the vcpu is activated in 'pm_active'. Add defines for various AMD-specific MSRs. Discussed with: kib (r261321)
* MFC r270326neel2014-12-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a recursive lock acquisition in vi_reset_dev(). MFC r270434 Return the spurious interrupt vector (IRQ7 or IRQ15) if the atpic cannot find any unmasked pin with an interrupt asserted. MFC r270436 Fix a bug in the emulation of CPUID leaf 0x4. MFC r270437 Add "hw.vmm.topology.threads_per_core" and "hw.vmm.topology.cores_per_package" tunables to modify the default cpu topology advertised by bhyve. MFC r270855 Set the 'inst_length' to '0' early on before any error conditions are detected in the emulation of the task switch. If any exceptions are triggered then the guest %rip should point to instruction that caused the task switch as opposed to the one after it. MFC r270857 The "SUB" instruction used in getcc() actually does 'x -= y' so use the proper constraint for 'x'. The "+r" constraint indicates that 'x' is an input and output register operand. While here generate code for different variants of getcc() using a macro GETCC(sz) where 'sz' indicates the operand size. Update the status bits in %rflags when emulating AND and OR opcodes. MFC r271439 Initialize 'bc_rdonly' to the right value. MFC r271451 Optimize the common case of injecting an interrupt into a vcpu after a HLT by explicitly moving it out of the interrupt shadow. MFC r271888 Restructure the MSR handling so it is entirely handled by processor-specific code. MFC r271890 MSR_KGSBASE is no longer saved and restored from the guest MSR save area. This behavior was changed in r271888 so update the comment block to reflect this. MFC r271891 Add some more KTR events to help debugging. MFC r272197 mmap(2) requires either MAP_PRIVATE or MAP_SHARED for non-anonymous mappings. MFC r272395 Get rid of code that dealt with the hardware not being able to save/restore the PAT MSR on guest exit/entry. This workaround was done for a beta release of VMware Fusion 5 but is no longer needed in later versions. All Intel CPUs since Nehalem have supported saving and restoring MSR_PAT in the VM exit and entry controls. MFC r272670 Inject #UD into the guest when it executes either 'MONITOR' or 'MWAIT'. MFC r272710 Implement the FLUSH operation in the virtio-block emulation. MFC r272838 iasl(8) expects integer fields in data tables to be specified as hexadecimal values. Therefore the bit width of the "PM Timer Block" was actually being interpreted as 50-bits instead of the expected 32-bit. This eliminates an error message emitted by a Linux 3.17 guest during boot: "Invalid length for FADT/PmTimerBlock: 50, using default 32" MFC r272839 Support Intel-specific MSRs that are accessed when booting up a linux in bhyve: - MSR_PLATFORM_INFO - MSR_TURBO_RATIO_LIMITx - MSR_RAPL_POWER_UNIT MFC r273108 Emulate "POP r/m". This is needed to boot OpenBSD/i386 MP kernel in bhyve. MFC r273212 Support stopping and restarting the AHCI command list via toggling PxCMD.ST from '1' to '0' and back. This allows the driver a chance to recover if for instance a timeout occurred due to activity on the host.
* MFC r271208:kib2014-12-231-0/+2
| | | | Add a define for index of IA32_XSS MSR.
* MFC r271206:kib2014-12-231-1/+3
| | | | Adjust the definition of struct xstate_hdr according to SDM rev. 50.
* MFC r271197:kib2014-12-232-0/+16
| | | | | Add more bits for the XSAVE features from CPUID 0xd, sub-function 1 %eax report. Print the XSAVE features 0xd/1 in the boot banner.
* MFC 273988,273989,273995,274057:jhb2014-12-221-4/+2
| | | | | MFamd64: Add support for extended FPU states on i386. This includes support for AVX on i386.
* MFC 271405,271408,271409,272658:jhb2014-12-222-29/+1
| | | | | | MFamd64: Use initializecpu() to set various model-specific registers on AP startup and AP resume (it was already used for BSP startup and BSP resume).
* MFC 260557,271076,271077,271082,271083,271098:jhb2014-12-222-0/+2022
| | | | | | | | | | - Remove spaces from boot messages when we print the CPU ID/Family/Stepping - Move prototypes for various functions into out of C files and into <machine/md_var.h>. - Reduce diffs between i386 and amd64 initcpu.c and identcpu.c files. - Move blacklists of broken TSCs out of the printcpuinfo() function and into the TSC probe routine. - Merge the amd64 and i386 identcpu.c into a single x86 implementation.
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-8/+1
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* MFC 270850,271053,271192,271717:jhb2014-09-221-15/+28
| | | | | | | | | | | | Save and restore FPU state across suspend and resume on i386. - Create a separate structure for per-CPU state saved across suspend and resume that is a superset of a pcb. - Store the FPU state for suspend and resume in the new structure (for amd64, this moves it out of the PCB) - On both i386 and amd64, all of the FPU suspend/resume handling is now done in C. Approved by: re (hrs)
* MFC r263859:akiyama2014-08-311-10/+8
| | | | | | | | | | Change default logic to CONFORM because this routine is shared with SCI polarity setting. Reviewed by: jhb MFC r269184: Add missing newline to output dmesg properly.
* MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276,grehan2014-08-191-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r268427, r268428, r268521, r268638, r268639, r268701, r268777, r268889, r268922, r269008, r269042, r269043, r269080, r269094, r269108, r269109, r269281, r269317, r269700, r269896, r269962, r269989. Catch bhyve up to CURRENT. Lightly tested with FreeBSD i386/amd64, Linux i386/amd64, and OpenBSD/amd64. Still resolving an issue with OpenBSD/i386. Many thanks to jhb@ for all the hard work on the prior MFCs ! r267921 - support the "mov r/m8, imm8" instruction r267934 - document options r267949 - set DMI vers/date to fixed values r267959 - doc: sort cmd flags r267966 - EPT misconf post-mortem info r268202 - use correct flag for event index r268276 - 64-bit virtio capability api r268427 - invalidate guest TLB when cr3 is updated, needed for TSS r268428 - identify vcpu's operating mode r268521 - use correct offset in guest logical-to-linear translation r268638 - chs value r268639 - chs fake values r268701 - instr emul operand/address size override prefix support r268777 - emulation for legacy x86 task switching r268889 - nested exception support r268922 - fix INVARIANTS build r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel r269042 - fix fault injection r269043 - Reduce VMEXIT_RESTARTs in task_switch.c r269080 - fix issues in PUSH emulation r269094 - simplify return values from the inout handlers r269108 - don't return -1 from the push emulation handler r269109 - avoid permanent sleep in vm_handle_hlt() r269281 - list VT-x features in base kernel dmesg r269317 - Mark AHCI fatal errors as not completed r269700 - Support PCI extended config space in bhyve r269896 - Minor cleanup r269962 - use max guest memory when creating IOMMU domain r269989 - fix interrupt mode names
* MFC: r260457marius2014-08-051-7/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes in r233781 attempted to make logging during a machine check exception more readable. In practice they prevented all logging during a machine check exception on at least some systems. Specifically, when an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere class machine, all CPUs receive a machine check exception, but only CPUs on the same package as the memory controller for the erroring DIMM log an error. The CPUs on the other package would complete the scan of their machine check banks and panic before the first set of CPUs could log an error. The end result was a clearer display during the panic (no interleaved messages), but a crashdump without any useful info about the error that occurred. To handle this case, make all CPUs spin in the machine check handler once they have completed their scan of their machine check banks until at least one machine check error is logged. I tried using a DELAY() instead so that the CPUs would not potentially hang forever, but that was not reliable in testing. While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic. Only clear it if the machine check handler does not panic and returns to the interrupted thread. MFC: r263113 Correct type for malloc(). Submitted by: "Conrad Meyer" <conrad.meyer@isilon.com> MFC: r269052, r269239, r269242 Intel desktop Haswell CPUs may report benign corrected parity errors (see HSD131 erratum in [1]) at a considerable rate. So filter these (default), unless logging is enabled. Unfortunately, there really is no better way to reasonably implement suppressing these errors than to just skipping them in mca_log(). Given that they are reported for bank 0, they'd need to be masked in MSR_MC0_CTL. However, P6 family processors require that register to be set to either all 0s or all 1s, disabling way more than the one error in question when using all 0s there. Alternatively, it could be masked for the corresponding CMCI, but that still wouldn't keep the periodic scanner from detecting these spurious errors. Apart from that, register contents of MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel Architectures Developer's Manual nor in the Haswell datasheets. Note that while HSD131 actually is only about C0-stepping as of revision 014 of the Intel desktop 4th generation processor family specification update, these corrected errors also have been observed with D0-stepping aka "Haswell Refresh". 1: http://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf Reviewed by: jhb Sponsored by: Bally Wulff Games & Entertainment GmbH
* Merge r266746, 266775:scottl2014-07-011-30/+35
| | | | | | | | | | | | | Now that there are separate back-end implementations of busdma, the bounce implementation shouldn't steal flags from the common front-end. Move those flags to the back-end. Eliminate the fake contig_dmamap and replace it with a new flag, BUS_DMA_KMEM_ALLOC. They serve the same purpose, but using the flag means that the map can be NULL again, which in turn enables significant optimizations for the common case of no bouncing. Obtained from: Netflix, Inc.
* MFC r263795:rodrigc2014-06-231-4/+8
| | | | | | | | | | | | | Strict value checking will cause problem. Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD. This behaviour is bug-compatible with Linux-3.13.5. References: http://d.hatena.ne.jp/syuu1228/20140326 http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094 Submitted by: syuu PR: 187966
* Undo bad merge.rodrigc2014-06-231-8/+4
|
* MFC r263795:rodrigc2014-06-231-4/+8
| | | | | | | | | | | | | Strict value checking will cause problem. Bay trail DN2820FYKH is supported on Linux but does not work on FreeBSD. This behaviour is bug-compatible with Linux-3.13.5. References: http://d.hatena.ne.jp/syuu1228/20140326 http://lxr.linux.no/linux+v3.13.5/arch/x86/kernel/acpi/boot.c#L1094 Submitted by: syuu PR: 187966
* MFC 266263,266551,266552:jhb2014-06-121-6/+25
| | | | | | - Add definitions for more structured extended features as well as XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions). - Don't permit users to request a subset of the AVX512 or MPX xsave masks.
* MFC 263772: Fix build without SMP.jhb2014-06-041-1/+5
| | | | PR: 187854
OpenPOWER on IntegriCloud