FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix a KTR_BUSDMA format string.	rpaulo	2013-06-18	1	-1/+1
\|
*	Add basic support for FDT to i386 & amd64. This change includes:	marcel	2013-05-21	3	-0/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.
*	o Add accessor functions to add and remove pages from a specific	attilio	2013-05-13	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	freelist. o Split the pool of free pages queues really by domain and not rely on definition of VM_RAW_NFREELIST. o For MAXMEMDOM > 1, wrap the RR allocation logic into a specific function that is called when calculating the allocation domain. The RR counter is kept, currently, per-thread. In the future it is expected that such function evolves in a real policy decision referee, based on specific informations retrieved by per-thread and per-vm_object attributes. o Add the concept of "probed domains" under the form of vm_ndomains. It is responsibility for every architecture willing to support multiple memory domains to correctly probe vm_ndomains along with mem_affinity segments attributes. Those two values are supposed to remain always consistent. Please also note that vm_ndomains and td_dom_rr_idx are both int because segments already store domains as int. Ideally u_int would have much more sense. Probabilly this should be cleaned up in the future. o Apply RR domain selection also to vm_phys_zero_pages_idle(). Sponsored by: EMC / Isilon storage division Partly obtained from: jeff Reviewed by: alc Tested by: jeff
*	Fix several typos	eadler	2013-05-12	1	-1/+1
\| \| \| \| \| \|	PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days
*	Adding a detach method to p4tcc driver.	hiren	2013-05-10	1	-0/+20
\| \| \| \| \| \| \| \|	PR: 118739 Submitted by: Dan Lukes <dan@obluda.cz> (earlier version) Reviewed by: jhb Approved by: sbruno (mentor) MFC after: 1 week
*	Revert r250339 as apparently it is more clutter than help.	attilio	2013-05-08	1	-42/+0
\| \| \| \| \|	Sponsored by: EMC / Isilon storage division Requested by: jhb
*	Add functions to do ACPI System Locality Information Table parsing	attilio	2013-05-07	1	-0/+42
\| \| \| \| \| \| \| \| \|	and printing at boot. For reference on table informations and purposes please review ACPI specs. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: jhb (earlier version)
*	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in	attilio	2013-05-07	1	-5/+3
\| \| \| \| \| \| \| \| \|	order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
*	Introduce kern.timecounter.smp_tsc_adjust tunable (disabled by default) and	mav	2013-04-18	1	-3/+60
\| \| \| \| \| \| \| \| \| \| \|	respective functionality, allowing to synchronize TSC on APs to match BSP's during boot. It may be unsafe in general case due to theoretical chance of later drift if CPUs are using different clock rate or source, but it allows to use TSC in some cases when difference caused by some initialization bug, while TSCs are known to increment synchronously. Reviewed by: jimharris, kib MFC after: 1 month
*	Move the previously added CPUID7 macros to CPUID_STDEXT.	rpaulo	2013-04-18	1	-17/+11
\|
*	Add the most current CPUID7_* definitions.	rpaulo	2013-04-18	1	-0/+17
\|
*	Make the code to check if VMX is enabled more readable by using macros	neel	2013-04-11	1	-0/+5
\| \| \| \| \| \|	instead of magic numbers. Discussed with: Chris Torek
*	Unsynchronized TSCs on the host require special handling in bhyve:	neel	2013-04-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	- use clock_gettime(2) as the time base for the emulated ACPI timer instead of directly using rdtsc(). - don't advertise the invariant TSC capability to the guest to discourage it from using the TSC as its time base. Discussed with: jhb@ (about making 'smp_tsc' a global) Reported by: Dan Mack on freebsd-virtualization@ Obtained from: NetApp
*	Record the correct error in the trace.	kib	2013-04-01	1	-1/+1
\| \| \| \| \|	Sponsored by: The FreeBSD Foundation MFC after: 3 days
*	MFcalloutng:	mav	2013-02-28	3	-40/+19
\| \| \| \| \| \| \|	Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.
*	Use critical_enter/critical_exit around the time sensitive part of	imp	2013-02-21	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	this code to depessimize the worst case we've lived with silently and uneventfully for the past 12 years. Add a comment about a refinement for those needing more assurance of accuracy. Fix ddb's show rtc command deadlock potential when debugging rtc code by not taking the lock if we're in the debugger. If you need a thumb to count the number of people that have encountered this, I'd be surprised. Submitted by: bde
*	Correct comment about use of pmtimer, and the real reason it isn't	imp	2013-02-21	1	-3/+4
\| \| \| \|	used or desirable for amd64.
*	Fix broken usage of splhigh() by removing it.	imp	2013-02-21	1	-6/+2
\|
*	Convert machine/elf.h, machine/frame.h, machine/sigframe.h,	kib	2013-02-20	5	-0/+767
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386. Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far. The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.
*	Fixup r246916 in case gcc is used to build.	davide	2013-02-19	1	-0/+2
\| \| \| \|	Reported by: attilio, simon
*	MFcalloutng:	mav	2013-02-17	1	-4/+21
\| \| \| \| \| \| \|	Microoptimize i8254 one-shot operation mode (disabled by default to allow timecounter functionality) by not writing to mode and MSB registers when it is not required. This saves several microseconds of CPU time per call, reducing minimal measured interrupts interval to 19.5us.
*	Make VM_NDOMAIN a kernel option so that it can be enabled from a kernel	jhb	2013-02-14	1	-0/+2
\| \| \| \| \| \| \|	config file. Requested by: phk (ages ago) MFC after: 1 month
*	Reform the busdma API so that new types may be added without modifying	kib	2013-02-12	1	-260/+221
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
*	x86 suspend/resume: suspend pics and pseudo-pics in reverse order	avg	2013-02-02	2	-8/+15
\| \| \| \| \| \| \| \| \| \|	- change 'pics' from STAILQ to TAILQ - ensure that Local APIC is always first in 'pics' Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days
*	The change to reduce default smp_tsc_shift caused tsc shift to become	kib	2013-02-01	1	-35/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	zero on slower machines, which make the fenced get_timecount methods not used despite needed. Remove the (shift > 0) condition when selecting the get_timecount() implementation. Rename smp_tsc_shift to tsc_shift, and apply it for the UP case too. Allow shift to reach value of 31 instead of 30, as it was previously (should be nop). Reorganize the tc quality calculation to remove the conditionally compiled block. Rename test_smp_tsc() to test_tsc() and provide separate versions for SMP and UP builds. The check for virtialized hardware is more natural to perform in the smp version of the test_tsc(), since it is only done for smp case. Noted and reviewed by: bde (previous version) MFC after: 12 days
*	Reduce default shift used to calculate the max frequency for the TSC	kib	2013-01-30	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	timecounter to 1, and correspondingly increase the precision of the gettimeofday(2) and related functions in the default configuration. The motivation for the TSC-low timecounter, as described in the r222866, seems to provide a workaround for the non-serializing behaviour of the RDTSC on some Intel hardware. Tests demonstrate that even with the pre-shift of 8, the cross-core non-monotonicity of the RDTSC is still observed reliably, e.g. on the Nehalems. The r238755 and r238973 implemented the proper fix for the issue. The pre-shift of 1 is applied to keep TSC not overflowing for the frequency of hardclock down to 2 sec/intr. The pre-shift is made a tunable to allow the easy debugging of the issues users could see with the shift being too low. Reviewed by: bde MFC after: 2 weeks
*	Don't attempt to use clflush on the local APIC register window. Various	jhb	2013-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway. MFC after: 2 weeks
*	Add macros required to enable VMX operation on Intel processors.	neel	2013-01-05	1	-0/+2
\| \| \| \|	Obtained from: NetApp
*	Add bus_space_read_8 and bus_space_write_8 for amd64.	jimharris	2012-12-13	1	-4/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than trying to KASSERT for callers that invoke this on IO tags, either do nothing (for write_8) or return ~0 (for read_8). Using KASSERT here just makes bus.h too messy from both polluting bus.h with systm.h (for any number of drivers that include bus.h without first including systm.h) or ports that use bus.h directly (i.e. libpciaccess) as reported by zeising@. Also don't try to implement all of the other bus_space functions for 8 byte access since realistically only these two are needed for some devices that expose 64-bit memory-mapped registers. Put the amd64-specific functions here rather than sys/amd64/include/bus.h so that we can keep this header unified for x86, as requested by mdf@ and tijl@. Submitted by: Carl Delsey <carl.r.delsey@intel.com> MFC after: 3 days
*	Revert r243960 based on feedback regarding keeping x86 headers unified	jimharris	2012-12-13	1	-0/+38
\| \| \| \| \| \|	(mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@). Alternate implementation will be made in a separate commit.
*	Add amd64 implementations for 8-byte bus_space routines.	jimharris	2012-12-06	1	-38/+0
\| \| \| \| \| \| \|	Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week
*	ioapic_program_intpin: program high bits before low bits	avg	2012-12-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Programming the low bits has a side-effect if unmasking the pin if it is not disabled. So if an interrupt was pending then it would be delivered with the correct new vector but to the incorrect old LAPIC. This fix could be made clearer by preserving the mask bit while programming the low bits and then explicitly resetting the mask bit after all the programming is done. Probability to trip over the fixed bug could be increased by bootverbose because printing of the interrupt information in ioapic_assign_cpu lengthened the time window during which an interrupt could arrive while a pin is masked. Reported by: Andreas Longwitz <longwitz@incore.de> Tested by: Andreas Longwitz <longwitz@incore.de> MFC after: 12 days
*	Provide the reading and display of the Standard Extended Features,	kib	2012-11-01	1	-0/+11
\| \| \| \| \| \| \| \|	introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers. Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks
*	This isn't functionally identical. In some cases a hint to disable	eadler	2012-10-22	3	-0/+9
\| \| \| \| \| \| \| \|	unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)
*	Now that device disabling is generic, remove extraneous code from the	eadler	2012-10-22	3	-9/+0
\| \| \| \| \| \| \| \|	device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week
*	Add an unified macro to deny ability from the compiler to reorder	attilio	2012-10-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks
*	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of	attilio	2012-10-09	1	-8/+1
\| \| \| \| \| \| \| \|	r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius
*	Add missing header needed by free(9).	kevlo	2012-09-30	1	-0/+1
\| \| \| \|	Spotted by: David Wolfskill <david at catwhisker dot org>
*	Free result of device_get_children(9).	kevlo	2012-09-30	1	-0/+1
\|
*	- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific	jhb	2012-09-28	1	-38/+8
\| \| \| \| \| \| \| \| \| \| \|	bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks
*	Allow static DMA allocations that allow for enough segments to do page-sized	jhb	2012-08-17	1	-6/+7
\| \| \| \| \| \| \| \| \|	segments for the entire allocation to use kmem_alloc_attr() to allocate KVM rather than using kmem_alloc_contig(). This avoids requiring a single physically contiguous chunk in this case. Submitted by: Peter Jeremy (original version) MFC after: 1 month
*	Merge ACPICA 20120816.	jkim	2012-08-16	1	-1/+1
\|
*	During TSC synchronization test, use rdtsc() rather than rdtsc32(), to	jimharris	2012-08-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	protect against 32-bit TSC overflow while the sync test is running. On dual-socket Xeon E5-2600 (SNB) systems with up to 32 threads, there is non-trivial chance (2-3%) that TSC synchronization test fails due to 32-bit TSC overflow while the synchronization test is running. Sponsored by: Intel Reviewed by: jkim Discussed with: jkim, kib
*	Correct function name in comment.	jhb	2012-08-03	1	-1/+1
\| \| \| \|	Submitted by: alc
*	Microoptimize LAPIC timer routines to avoid reading from hardware during	mav	2012-08-03	1	-19/+24
\| \| \| \| \| \|	programming using earlier cached values. This makes respective routines to disappear from PMC top and reduces total number of active CPU cycles on idle 24-core system by 10%.
*	Improve the handling of static DMA buffers that use non-default memory	jhb	2012-08-03	1	-20/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	attributes (currently just BUS_DMA_NOCACHE): - Don't call pmap_change_attr() on the returned address, instead use kmem_alloc_contig() to ask the VM system for memory with the requested attribute. - As a result, always use kmem_alloc_contig() for non-default memory attributes, even for sub-page allocations. This requires adjusting bus_dmamem_free()'s logic for determining which free routine to use. - For x86, add a new dummy bus_dmamap that is used for static DMA buffers allocated via kmem_alloc_contig(). bus_dmamem_free() can then use the map pointer to determine which free routine to use. - For powerpc, add a new flag to the allocated map (bus_dmamem_alloc() always creates a real map on powerpc) to indicate which free routine should be used. Note that the BUS_DMA_NOCACHE handling in powerpc is currently #ifdef'd out. I have left it disabled but updated it to match x86. Reviewed by: scottl MFC after: 1 month
*	Do a trivial reformatting of the comment, to record the proper commit	kib	2012-08-01	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	message for r238973: Rdtsc instruction is not synchronized, it seems on some Intel cores it can bypass even the locked instructions. As a result, rdtsc executed on different cores may return unordered TSC values even when the rdtsc appearance in the instruction sequences is provably ordered. Similarly to what has been done in r238755 for TSC synchronization test, add explicit fences right before rdtsc in the timecounters 'get' functions. Intel recommends to use LFENCE, while AMD refers to MFENCE. For VIA follow what Linux does and use LFENCE. With this change, I see no reordered reads of TSC on Nehalem. Change the rmb() to inlined CPUID in the SMP TSC synchronization test. On i386, locked instruction is used for rmb(), and as noted earlier, it is not enough. Since i386 machine may not support SSE2, do simplest possible synchronization with CPUID. MFC after: 1 week Discussed with: avg, bde, jkim
*	diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c	kib	2012-08-01	1	-14/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	index c253a96..3d8bd30 100644 --- a/sys/x86/x86/tsc.c +++ b/sys/x86/x86/tsc.c @@ -82,7 +82,11 @@ static void tsc_freq_changed(void arg, const struct cf_level level, static void tsc_freq_changing(void arg, const struct cf_level level, int status); static unsigned tsc_get_timecount(struct timecounter tc); -static unsigned tsc_get_timecount_low(struct timecounter tc); +static inline unsigned tsc_get_timecount_low(struct timecounter tc); +static unsigned tsc_get_timecount_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_mfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_mfence(struct timecounter tc); static void tsc_levels_changed(void arg, int unit); static struct timecounter tsc_timecounter = { @@ -262,6 +266,10 @@ probe_tsc_freq(void) (vm_guest == VM_GUEST_NO && CPUID_TO_FAMILY(cpu_id) >= 0x10)) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_mfence; + } break; case CPU_VENDOR_INTEL: if ((amd_pminfo & AMDPM_TSC_INVARIANT) != 0 \|\| @@ -271,6 +279,10 @@ probe_tsc_freq(void) (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x3)))) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; case CPU_VENDOR_CENTAUR: if (vm_guest == VM_GUEST_NO && @@ -278,6 +290,10 @@ probe_tsc_freq(void) CPUID_TO_MODEL(cpu_id) >= 0xf && (rdmsr(0x1203) & 0x100000000ULL) == 0) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; } @@ -328,16 +344,31 @@ init_TSC(void) #ifdef SMP -/ rmb is required here because rdtsc is not a serializing instruction. / -#define TSC_READ(x) \ -static void \ -tsc_read_##x(void arg) \ -{ \ - uint32_t tsc = arg; \ - u_int cpu = PCPU_GET(cpuid); \ - \ - rmb(); \ - tsc[cpu 3 + x] = rdtsc32(); \ +/* + * RDTSC is not a serializing instruction, and does not drain + * instruction stream, so we need to drain the stream before executing + * it. It could be fixed by use of RDTSCP, except the instruction is + * not available everywhere. + * + * Use CPUID for draining in the boot-time SMP constistency test. The + * timecounters use MFENCE for AMD CPUs, and LFENCE for others (Intel + * and VIA) when SSE2 is present, and nothing on older machines which + * also do not issue RDTSC prematurely. There, testing for SSE2 and + * vendor is too cumbersome, and we learn about TSC presence from + * CPUID. + * + * Do not use do_cpuid(), since we do not need CPUID results, which + * have to be written into memory with do_cpuid(). + / +#define TSC_READ(x) \ +static void \ +tsc_read_##x(void arg) \ +{ \ + uint32_t tsc = arg; \ + u_int cpu = PCPU_GET(cpuid); \ + \ + __asm __volatile("cpuid" : : : "eax", "ebx", "ecx", "edx"); \ + tsc[cpu 3 + x] = rdtsc32(); \ } TSC_READ(0) TSC_READ(1) @@ -487,7 +518,16 @@ init: for (shift = 0; shift < 31 && (tsc_freq >> shift) > max_freq; shift++) ; if (shift > 0) { - tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; + if (cpu_feature & CPUID_SSE2) { + if (cpu_vendor_id == CPU_VENDOR_AMD) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_mfence; + } else { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_lfence; + } + } else + tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; tsc_timecounter.tc_name = "TSC-low"; if (bootverbose) printf("TSC timecounter discards lower %d bit(s)\n", @@ -599,16 +639,48 @@ tsc_get_timecount(struct timecounter tc __unused) return (rdtsc32()); } -static u_int +static inline u_int tsc_get_timecount_low(struct timecounter tc) { uint32_t rv; __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return (rv); } +static u_int +tsc_get_timecount_lfence(struct timecounter tc __unused) +{ + + lfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_lfence(struct timecounter tc) +{ + + lfence(); + return (tsc_get_timecount_low(tc)); +} + +static u_int +tsc_get_timecount_mfence(struct timecounter tc __unused) +{ + + mfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_mfence(struct timecounter tc) +{ + + mfence(); + return (tsc_get_timecount_low(tc)); +} + uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th) {
*	Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.	jimharris	2012-07-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intel Architecture Manual specifies that rdtsc instruction is not serialized, so without this change, TSC synchronization test would periodically fail, resulting in use of HPET timecounter instead of TSC-low. This caused severe performance degradation (40-50%) when running high IO/s workloads due to HPET MMIO reads and GEOM stat collection. Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC synchronization fail approximately 20% of the time. Sponsored by: Intel Reviewed by: kib MFC after: 3 days
*	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage	kib	2012-07-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month