FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	This isn't functionally identical. In some cases a hint to disable	eadler	2012-10-22	3	-0/+9
\| \| \| \| \| \| \| \|	unit 0 would in fact disable all units. This reverts r241856 Approved by: cperciva (implicit)
*	Now that device disabling is generic, remove extraneous code from the	eadler	2012-10-22	3	-9/+0
\| \| \| \| \| \| \| \|	device drivers that used to provide this feature. Reviewed by: des Approved by: cperciva MFC after: 1 week
*	Add an unified macro to deny ability from the compiler to reorder	attilio	2012-10-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks
*	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of	attilio	2012-10-09	1	-8/+1
\| \| \| \| \| \| \| \|	r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius
*	Add missing header needed by free(9).	kevlo	2012-09-30	1	-0/+1
\| \| \| \|	Spotted by: David Wolfskill <david at catwhisker dot org>
*	Free result of device_get_children(9).	kevlo	2012-09-30	1	-0/+1
\|
*	- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific	jhb	2012-09-28	1	-38/+8
\| \| \| \| \| \| \| \| \| \| \|	bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks
*	Allow static DMA allocations that allow for enough segments to do page-sized	jhb	2012-08-17	1	-6/+7
\| \| \| \| \| \| \| \| \|	segments for the entire allocation to use kmem_alloc_attr() to allocate KVM rather than using kmem_alloc_contig(). This avoids requiring a single physically contiguous chunk in this case. Submitted by: Peter Jeremy (original version) MFC after: 1 month
*	Merge ACPICA 20120816.	jkim	2012-08-16	1	-1/+1
\|
*	During TSC synchronization test, use rdtsc() rather than rdtsc32(), to	jimharris	2012-08-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	protect against 32-bit TSC overflow while the sync test is running. On dual-socket Xeon E5-2600 (SNB) systems with up to 32 threads, there is non-trivial chance (2-3%) that TSC synchronization test fails due to 32-bit TSC overflow while the synchronization test is running. Sponsored by: Intel Reviewed by: jkim Discussed with: jkim, kib
*	Correct function name in comment.	jhb	2012-08-03	1	-1/+1
\| \| \| \|	Submitted by: alc
*	Microoptimize LAPIC timer routines to avoid reading from hardware during	mav	2012-08-03	1	-19/+24
\| \| \| \| \| \|	programming using earlier cached values. This makes respective routines to disappear from PMC top and reduces total number of active CPU cycles on idle 24-core system by 10%.
*	Improve the handling of static DMA buffers that use non-default memory	jhb	2012-08-03	1	-20/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	attributes (currently just BUS_DMA_NOCACHE): - Don't call pmap_change_attr() on the returned address, instead use kmem_alloc_contig() to ask the VM system for memory with the requested attribute. - As a result, always use kmem_alloc_contig() for non-default memory attributes, even for sub-page allocations. This requires adjusting bus_dmamem_free()'s logic for determining which free routine to use. - For x86, add a new dummy bus_dmamap that is used for static DMA buffers allocated via kmem_alloc_contig(). bus_dmamem_free() can then use the map pointer to determine which free routine to use. - For powerpc, add a new flag to the allocated map (bus_dmamem_alloc() always creates a real map on powerpc) to indicate which free routine should be used. Note that the BUS_DMA_NOCACHE handling in powerpc is currently #ifdef'd out. I have left it disabled but updated it to match x86. Reviewed by: scottl MFC after: 1 month
*	Do a trivial reformatting of the comment, to record the proper commit	kib	2012-08-01	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	message for r238973: Rdtsc instruction is not synchronized, it seems on some Intel cores it can bypass even the locked instructions. As a result, rdtsc executed on different cores may return unordered TSC values even when the rdtsc appearance in the instruction sequences is provably ordered. Similarly to what has been done in r238755 for TSC synchronization test, add explicit fences right before rdtsc in the timecounters 'get' functions. Intel recommends to use LFENCE, while AMD refers to MFENCE. For VIA follow what Linux does and use LFENCE. With this change, I see no reordered reads of TSC on Nehalem. Change the rmb() to inlined CPUID in the SMP TSC synchronization test. On i386, locked instruction is used for rmb(), and as noted earlier, it is not enough. Since i386 machine may not support SSE2, do simplest possible synchronization with CPUID. MFC after: 1 week Discussed with: avg, bde, jkim
*	diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c	kib	2012-08-01	1	-14/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	index c253a96..3d8bd30 100644 --- a/sys/x86/x86/tsc.c +++ b/sys/x86/x86/tsc.c @@ -82,7 +82,11 @@ static void tsc_freq_changed(void arg, const struct cf_level level, static void tsc_freq_changing(void arg, const struct cf_level level, int status); static unsigned tsc_get_timecount(struct timecounter tc); -static unsigned tsc_get_timecount_low(struct timecounter tc); +static inline unsigned tsc_get_timecount_low(struct timecounter tc); +static unsigned tsc_get_timecount_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_lfence(struct timecounter tc); +static unsigned tsc_get_timecount_mfence(struct timecounter tc); +static unsigned tsc_get_timecount_low_mfence(struct timecounter tc); static void tsc_levels_changed(void arg, int unit); static struct timecounter tsc_timecounter = { @@ -262,6 +266,10 @@ probe_tsc_freq(void) (vm_guest == VM_GUEST_NO && CPUID_TO_FAMILY(cpu_id) >= 0x10)) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_mfence; + } break; case CPU_VENDOR_INTEL: if ((amd_pminfo & AMDPM_TSC_INVARIANT) != 0 \|\| @@ -271,6 +279,10 @@ probe_tsc_freq(void) (CPUID_TO_FAMILY(cpu_id) == 0xf && CPUID_TO_MODEL(cpu_id) >= 0x3)))) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; case CPU_VENDOR_CENTAUR: if (vm_guest == VM_GUEST_NO && @@ -278,6 +290,10 @@ probe_tsc_freq(void) CPUID_TO_MODEL(cpu_id) >= 0xf && (rdmsr(0x1203) & 0x100000000ULL) == 0) tsc_is_invariant = 1; + if (cpu_feature & CPUID_SSE2) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_lfence; + } break; } @@ -328,16 +344,31 @@ init_TSC(void) #ifdef SMP -/ rmb is required here because rdtsc is not a serializing instruction. / -#define TSC_READ(x) \ -static void \ -tsc_read_##x(void arg) \ -{ \ - uint32_t tsc = arg; \ - u_int cpu = PCPU_GET(cpuid); \ - \ - rmb(); \ - tsc[cpu 3 + x] = rdtsc32(); \ +/* + * RDTSC is not a serializing instruction, and does not drain + * instruction stream, so we need to drain the stream before executing + * it. It could be fixed by use of RDTSCP, except the instruction is + * not available everywhere. + * + * Use CPUID for draining in the boot-time SMP constistency test. The + * timecounters use MFENCE for AMD CPUs, and LFENCE for others (Intel + * and VIA) when SSE2 is present, and nothing on older machines which + * also do not issue RDTSC prematurely. There, testing for SSE2 and + * vendor is too cumbersome, and we learn about TSC presence from + * CPUID. + * + * Do not use do_cpuid(), since we do not need CPUID results, which + * have to be written into memory with do_cpuid(). + / +#define TSC_READ(x) \ +static void \ +tsc_read_##x(void arg) \ +{ \ + uint32_t tsc = arg; \ + u_int cpu = PCPU_GET(cpuid); \ + \ + __asm __volatile("cpuid" : : : "eax", "ebx", "ecx", "edx"); \ + tsc[cpu 3 + x] = rdtsc32(); \ } TSC_READ(0) TSC_READ(1) @@ -487,7 +518,16 @@ init: for (shift = 0; shift < 31 && (tsc_freq >> shift) > max_freq; shift++) ; if (shift > 0) { - tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; + if (cpu_feature & CPUID_SSE2) { + if (cpu_vendor_id == CPU_VENDOR_AMD) { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_mfence; + } else { + tsc_timecounter.tc_get_timecount = + tsc_get_timecount_low_lfence; + } + } else + tsc_timecounter.tc_get_timecount = tsc_get_timecount_low; tsc_timecounter.tc_name = "TSC-low"; if (bootverbose) printf("TSC timecounter discards lower %d bit(s)\n", @@ -599,16 +639,48 @@ tsc_get_timecount(struct timecounter tc __unused) return (rdtsc32()); } -static u_int +static inline u_int tsc_get_timecount_low(struct timecounter tc) { uint32_t rv; __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return (rv); } +static u_int +tsc_get_timecount_lfence(struct timecounter tc __unused) +{ + + lfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_lfence(struct timecounter tc) +{ + + lfence(); + return (tsc_get_timecount_low(tc)); +} + +static u_int +tsc_get_timecount_mfence(struct timecounter tc __unused) +{ + + mfence(); + return (rdtsc32()); +} + +static u_int +tsc_get_timecount_low_mfence(struct timecounter tc) +{ + + mfence(); + return (tsc_get_timecount_low(tc)); +} + uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th) {
*	Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.	jimharris	2012-07-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intel Architecture Manual specifies that rdtsc instruction is not serialized, so without this change, TSC synchronization test would periodically fail, resulting in use of HPET timecounter instead of TSC-low. This caused severe performance degradation (40-50%) when running high IO/s workloads due to HPET MMIO reads and GEOM stat collection. Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC synchronization fail approximately 20% of the time. Sponsored by: Intel Reviewed by: kib MFC after: 3 days
*	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage	kib	2012-07-14	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month
*	Make the wchar_t type machine dependent.	andrew	2012-06-24	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred. Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively. Discussed with: bde
*	Implement mechanism to export some kernel timekeeping data to	kib	2012-06-22	2	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
*	- Remove unused code for CR3 and CR4.	jkim	2012-06-13	1	-12/+11
\| \| \| \|	- Fix few style(9) nits while I am here.
*	Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c	iwasaki	2012-06-12	1	-43/+1
\| \| \| \|	as ipi_startup().
*	Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of	iwasaki	2012-06-09	1	-0/+434
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@
*	free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG	avg	2012-06-03	1	-4/+2
\| \| \| \| \| \|	Those calls are useful with hardware watchdog drivers too. MFC after: 3 weeks
*	Consitently use "__LP64__".	obrien	2012-05-24	4	-11/+11
\| \| \| \| \|	[there are 33 __LP64__'s in the kernel (minus cddl/ and contrib/), and 11 _LP64's]
*	Don't expose i386-only ptrace constants on amd64. This broke gdb with	jhb	2012-05-17	1	-2/+4
\| \| \| \| \| \|	libthread_db on amd64. Reported by: avg
*	Revert part of r234723 by re-enabling the SMP protection for	attilio	2012-05-03	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intr_bind() on x86. This has been requested by jhb and I strongly disagree with this, but as long as he is the x86 and interrupt subsystem maintainer I will follow his directives. The disagreement cames from what we should really consider as a public KPI. IMHO, if we really need a selection between the kernel functions, we may need an explicit protection like _KERNEL_KPI, which defines which subset of the kernel function might really be considered as part of the KPI (for thirdy part modules) and which not. As long as we don't have this mechanism I just consider any possible function as usable by thirdy part code, thus intr_bind() included. MFC after: 1 week
*	Clean up the intr* MD KPI from the SMP dependency, removing a cause of	attilio	2012-04-26	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	discrepancy between modules and kernel, but deal with SMP differences within the functions themselves. As an added bonus this also helps in terms of code readability. Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week
*	Add x2apic MSR definitions	grehan	2012-04-17	1	-1/+35
\| \| \| \| \|	Reviewed by: jhb Obtained from: bhyve via Neel via NetApp
*	Trim stray blank line.	jhb	2012-04-11	1	-1/+0
\|
*	Recognize the RDRAND instruction feature.	jhb	2012-04-09	1	-0/+1
\| \| \| \| \|	Submitted by: Michael Fuckner michael fuckner net MFC after: 3 days
*	Fix interrupt load balancing regression, introduced in revision	gibbs	2012-04-06	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	222813, that left all un-pinned interrupts assigned to CPU 0. sys/x86/x86/intr_machdep.c: In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized the "intr_cpus" cpuset to only contain CPU0. This initialization is too late and nullifies the results of calls the intr_add_cpu() that occur much earlier in the boot process. Since "intr_cpus" is statically initialized to the empty set, and all processors, including the BSP, already add themselves to "intr_cpus" no special initialization for the BSP is necessary. MFC after: 3 days
*	Further tweak the changes made in r233709. The kernel doesn't permit	jhb	2012-04-02	1	-14/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	sleeping from a swi handler (even though in this case it would be ok), so switch the refill and scanning SWI handlers to being tasks on a fast taskqueue. Also, only schedule the refill task for a CMCI as an MC# can fire at any time, so it should do the minimal amount of work needed and avoid opportunities to deadlock before it panics (such as scheduling a task it won't ever need in practice). To handle the case of an MC# only finding recoverable errors (which should never happen), always try to refill the event free list when the periodic scan executes. MFC after: 2 weeks
*	Make machine check exception logging more readable. On newer Intel systems,	jhb	2012-04-02	2	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \|	an uncorrected ECC error tends to fire on all CPUs in a package simultaneously and the current printf hacks are not sufficient to make the messages legible. Instead, use the existing mca_lock spinlock to serialize calls to mca_log() and change the machine check code to panic directly when an unrecoverable error is encoutered rather than falling back to a trap_fatal() call in trap() (which adds nearly a screen-full of logging messages that aren't useful for machine checks). MFC after: 2 weeks
*	Attempt to make machine check handling a bit more robust:	jhb	2012-03-30	1	-28/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Don't malloc() new MCA records for machine checks logged due to a CMCI or MC# exception. Instead, use a pre-allocated pool of records. When a CMCI or MC# exception fires, schedule a swi to refill the pool. The pool is sized to hold at least one record per available machine bank, and one record per CPU. This should handle the case of all CPUs triggering a single bank at once as well as the case a single CPU triggering all of its banks. The periodic scans still use malloc() since they are run from a safe context. - Since we have to create an swi to handle refills, make the periodic scan a second swi for the same thread instead of having a separate taskqueue thread for the scans. Suggested by: mdf (avoiding malloc()) MFC after: 2 weeks
*	Move the legacy(4) driver to x86.	jhb	2012-03-30	4	-2/+437
\|
*	Fix an issue introduced in sys/x86/include/endian.h with r232721. In	dim	2012-03-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that revision, the bswapXX_const() macros were renamed to bswapXX_gen(). Also, bswap64_gen() was implemented as two calls to bswap32(), and similarly, bswap32_gen() as two calls to bswap16(). This mainly helps our base gcc to produce more efficient assembly. However, the arguments are not properly masked, which results in the wrong value being calculated in some instances. For example, bswap32(0x12345678) returns 0x7c563412, and bswap64(0x123456789abcdef0) returns 0xfcdefc9a7c563412. Fix this by appropriately masking the arguments to bswap16() in bswap32_gen(), and to bswap32() in bswap64_gen(). This should also silence warnings from clang. Submitted by: jh
*	Revert sys/x86/include/endian.h to what it was before r233419, as that	dim	2012-03-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	revision has two problems: - It can produce worse code with both clang and gcc. - It doesn't fix the actual issue introduced in r232721, which will be fixed in the next commit. Submitted by: bde, tijl and jh Pointy hat to: dim
*	Use a more proper fix for enabling HT MSI mapping windows on Host-PCI	jhb	2012-03-29	2	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bridges. Rather than blindly enabling the windows on all of them, only enable the window when an MSI interrupt is enabled for a device behind the bridge, similar to what already happens for HT PCI-PCI bridges. To implement this, each x86 Host-PCI bridge driver has to be able to locate it's actual backing device on bus 0. For ACPI, use the _ADR method to find the slot and function of the device. For the non-ACPI case, the legacy(4) driver already scans bus 0 looking for Host-PCI bridge devices. Now it saves the slot and function of each bridge that it finds as ivars that the Host-PCI bridge driver can then use in its pcib_map_msi() method. This fixes machines where non-MSI interrupts were broken by the previous round of HT MSI changes. Tested by: bapt MFC after: 1 week
*	Restore proper use of bounce buffers for ISA DMA. When locking was	jhb	2012-03-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	added, the call to pmap_kextract() was moved up, and as a result the code never updated the physical address to use for DMA if a bounce buffer was used. Restore the earlier location of pmap_kextract() so it takes bounce buffers into account. Tested by: kargl MFC after: 1 week
*	Allocate the ioapics[] array dynamically since it is only needed for the	jhb	2012-03-28	1	-5/+11
\| \| \| \| \| \| \| \|	duration of madt_setup_io(). This avoids having the array take up permanent space in the BSS. Inspired by: bde MFC after: 2 weeks
*	Move the DTrace return IDT vector back up from 0x20 to 0x92. The 0x20	jhb	2012-03-28	1	-1/+1
\| \| \| \| \| \| \| \|	vector is currently dedicated to servicing IRQ 0 from the 8259A's, so it shouldn't be overloaded for DTrace. Tested by: rstone MFC after: 1 week
*	Fix the following clang warning in sys/dev/dcons/dcons.c, caused by the	dim	2012-03-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	recent changes in sys/x86/include/endian.h: sys/dev/dcons/dcons.c:190:15: error: implicit conversion from '__uint32_t' (aka 'unsigned int') to '__uint16_t' (aka 'unsigned short') changes value from 1684238190 to 28526 [-Werror,-Wconstant-conversion] buf->magic = ntohl(DCONS_MAGIC); ^~~~~~~~~~~~~~~~~~ sys/sys/param.h:306:18: note: expanded from: #define ntohl(x) __ntohl(x) ^ ./x86/endian.h:128:20: note: expanded from: #define __ntohl(x) __bswap32(x) ^ ./x86/endian.h:78:20: note: expanded from: __bswap32_gen((__uint32_t)(x)) : __bswap32_var(x)) ^ ./x86/endian.h:68:26: note: expanded from: (((__uint32_t)__bswap16(x) << 16) \| __bswap16((x) >> 16)) ^ ./x86/endian.h:75:53: note: expanded from: __bswap16_gen((__uint16_t)(x)) : __bswap16_var(x))) ~~~~~~~~~~~~~ ^ This is because the __bswapXX_gen() macros (for x86) call the regular __bswapXX() macros. Since the __bswapXX_gen() variants are only called when their arguments are constant, there is no need to do that constancy check recursively. Also, it causes the above error with clang. Fix it by calling __bswap16_gen() from __bswap32_gen(), and similarly, __bswap32_gen() from __bswap64_gen(). While here, add extra parentheses around the __bswap16_gen() macro expansion, to prevent unexpected side effects.
*	Mark the 'lapics' and 'ioapics' arrays here static since they are	jhb	2012-03-22	1	-2/+2
\| \| \| \| \| \| \| \|	private to this file. The 'lapics' array was actually shadowing a completely different 'lapics' array that is private to local_apic.c. Reported by: bde MFC after: 2 weeks
*	Copy amd64 sysarch.h to x86 and merge with i386 sysarch.h. Replace	tijl	2012-03-19	1	-0/+138
\| \| \| \|	amd64/i386/pc98 sysarch.h with stubs.
*	Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replace	tijl	2012-03-19	1	-0/+678
\| \| \| \|	amd64/i386/pc98 specialreg.h with stubs.
*	Copy i386 psl.h to x86 and replace amd64/i386/pc98 psl.h with stubs.	tijl	2012-03-19	1	-0/+84
\|
*	Move userland bits (and some common kernel bits) from amd64 and i386	tijl	2012-03-19	1	-0/+286
\| \| \| \| \| \| \|	segments.h to a new x86 segments.h. Add __packed attribute to some structs (just to be sure). Also make it clear that i386 GDT and LDT entries are used in ia64 code.
*	Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.	tijl	2012-03-18	1	-5/+8
\| \| \| \|	Reviewed by: kib
*	Copy i386 reg.h to x86 and merge with amd64 reg.h. Replace i386/amd64/pc98	tijl	2012-03-18	1	-0/+253
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reg.h with stubs. The tREGISTER macros are only made visible on i386. These macros are deprecated and should not be available on amd64. The i386 and amd64 versions of struct reg have been renamed to struct __reg32 and struct __reg64. During compilation either __reg32 or __reg64 is defined as reg depending on the machine architecture. On amd64 the i386 struct is also available as struct reg32 which is used in COMPAT_FREEBSD32 code. Most of compat/ia32/ia32_reg.h is now IA64 only. Reviewed by: kib (previous version)
*	Move userland bits of i386 npx.h and amd64 fpu.h to x86 fpu.h.	tijl	2012-03-16	1	-0/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove FPU types from compat/ia32/ia32_reg.h that are no longer needed. Create machine/npx.h on amd64 to allow compiling i386 code that uses this header. The original npx.h and fpu.h define struct envxmm differently. Both definitions have been included in the new x86 header as struct __envxmm32 and struct __envxmm64. During compilation either __envxmm32 or __envxmm64 is defined as envxmm depending on machine architecture. On amd64 the i386 struct is also available as struct envxmm32. Reviewed by: kib