summaryrefslogtreecommitdiffstats
path: root/sys/amd64/include/cpufunc.h
Commit message (Collapse)AuthorAgeFilesLines
* MFC r289824:kib2015-10-301-0/+7
| | | | | | | Add CLFLUSHOPT instruction wrappers. MFC r290188: Fix prefix on i386.
* MFC r261891: provide fast versions of ffsl and flsl for i386; ffsll andavg2015-10-231-0/+16
| | | | flsll for amd64
* Implement PV IPIs for PVHVM guests and further converge PV and HVMgibbs2013-09-061-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPI implmementations. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers. sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role. sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations. sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files. sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.
* Provide a wrapper for the INVPCID instruction, definition of thekib2013-08-301-0/+20
| | | | | | | | descriptor and symbolic names for the operation types. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
* Add lfence().kib2012-08-011-0/+7
| | | | MFC after: 1 week
* Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>jhb2012-07-091-0/+10
| | | | | | | | | | | on x86 and use that to implement stop_emulating() in the fpu/npx code. Reimplement start_emulating() in the non-XEN case by using load_cr0() and rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in the description of these instructions in Volume 2 of the ADM. Reviewed by: kib MFC after: 1 month
* Now that our assembler supports the xsave family of instructions, use themjhb2012-07-051-0/+19
| | | | | | | | natively rather than hand-assembled versions. For xgetbv/xsetbv, add a wrapper API to deal with xcr* registers: rxcr() and load_xcr(). Reviewed by: kib MFC after: 1 month
* Optimize reserve_pv_entries() using the popcnt instruction.alc2012-06-301-0/+9
|
* Correct function prototype for read_rflags().jhb2012-02-271-1/+1
|
* Move xrstor/xsave/xsetbv into fpu.c and reorder them.kib2012-01-301-38/+0
| | | | | Requested by: bde MFC after: 1 month
* Order newly added functions alphabetically.kib2012-01-251-12/+12
| | | | | Requested by: bde MFC after: 3 days
* Implement xsetbv(), xsave() and xrstor() providing C access to thekib2012-01-171-0/+38
| | | | | | | | | | | similarly named CPU instructions. Since our in-tree binutils gas is not aware of the instructions, and I have to use the byte-sequence to encode them, hardcode the r/m operand as (%rdi). This way, first argument of the pseudo-function is already placed into proper register. MFC after: 1 week
* Correct cpu_monitor() and cpu_mwait() for amd64. These instructions takejkim2011-07-051-5/+7
| | | | | | %rcx as "extensions" in long mode. If any unused bit is set in %rcx, these instructions cause general protection fault. Fix style nits and synchronize i386 with amd64.
* Add a function rdtsc32() to read lower 32 bits from TSC and discard upperjkim2011-04-141-0/+9
| | | | | | 32 bits. Some times compiler inserts unnecessary instructions to preserve unused upper 32 bits even when it is casted to a 32-bit value. It reduces such compiler mistakes where every cycle counts.
* Consistently use __volatile as the rest of this file.jkim2011-04-141-6/+6
|
* Prefer C99 standard integers to reduce diff from i386 version.jkim2011-04-141-63/+63
|
* Change the parameter passed to the inline assembly to u_shortrdivacky2010-09-031-23/+23
| | | | | | | | as we are dealing with 16bit segment registers. Change mov to movw. Approved by: rpaulo (mentor) Reviewed by: kib, rink
* Quiet variable "shadows" warning:obrien2010-01-011-18/+18
| | | | | | | sys/vmmeter.h: warning: shadowed declaration is here machine/cpufunc.h: In function 'insw': machine/cpufunc.h: warning: declaration of 'cnt' shadows a global declaration ..snip..
* cpufunc.h: unify/correct style of c extension namesavg2009-09-301-3/+3
| | | | | | | | | | i386 and amd64 archs only. inline => __inline. [1] __asm__ => __asm. [2] Reviewed by: kib, jhb [1] Suggested by: kib [2] MFC after: 1 week
* When the page caching attributes are changed, after new mapping iskib2009-07-221-0/+14
| | | | | | | | | | | | | | | established, OS shall flush the caches on all processors that may have used the mapping previously. This operation is not needed if processors support self-snooping. If not, but clflush instruction is implemented on the CPU, series of the clflush can be used on the mapping region. Otherwise, we have to flush the whole cache. The later operation is very expensive, and AMD-made CPUs do not have self-snooping. Implement cache flush for remapped region by using clflush for amd64, when supported by CPU. Proposed and reviewed by: alc Approved by: re (kensmith)
* Simplify in/out functions (for i386 and AMD64).ed2009-04-111-79/+8
| | | | | | | | Remove a hack to generate more efficient code for port numbers below 0x100, which has been obsolete for at least ten years, because GCC has an asm constraint to specify that. Submitted by: Christoph Mallon <christoph mallon gmx de>
* Don't explicitly force ecx to be used for MSR_FSBASE/MSR_GSBASE.ed2009-04-071-10/+4
| | | | | | | | | Because the "c" input constaint is used, the compiler will already place the MSR_FSBASE/MSR_GSBASE constants in ecx. Using __asm("ecx") makes LLVM crash. Even though this is also an LLVM bug, we'd better remove the unnecessary GCCism as well. Submitted by: Christoph Mallon <christoph.mallon@gmx.de>
* Change some movl's to mov's. Newer GAS no longer accept 'movl' instructionsobrien2009-01-311-9/+9
| | | | | | for moving between a segment register and a 32-bit memory location. Looked at by: jhb
* - Add cpuctl(4) pseudo-device driver to provide access to some low-levelstas2008-08-081-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | features of CPUs like reading/writing machine-specific registers, retrieving cpuid data, and updating microcode. - Add cpucontrol(8) utility, that provides userland access to the features of cpuctl(4). - Add subsequent manpages. The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX is created for each cpu present in the systems. The pseudo-device minor number corresponds to the cpu number in the system. The cpuctl(4) pseudo- device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID and UPDATE. The first pair alows the caller to read/write machine-specific registers from the correspondent CPU. cpuid data could be retrieved using the CPUID call, and microcode updates are applied via UPDATE. The permissions are inforced based on the pseudo-device file permissions. RDMSR/CPUID will be allowed when the caller has read access to the device node, while WRMSR/UPDATE will be granted only when the node is opened for writing. There're also a number of priv(9) checks. The cpucontrol(8) utility is intened to provide userland access to the cpuctl(4) device features. The utility also allows one to apply cpu microcode updates. Currently only Intel and AMD cpus are supported and were tested. Approved by: kib Reviewed by: rpaulo, cokane, Peter Jeremy MFC after: 1 month
* - Add inlines for the monitor and mwait instructions.jeff2008-04-181-0/+13
| | | | Sponsored by: Nokia
* Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed".nectar2005-05-131-0/+8
| | | | | | | Default off due to information disclosure on multi-user systems. Submitted by: cperciva Reviewed by: jhb
* Remove diffs to i386 version that came in via the compiler support ifdefs.peter2005-03-111-2/+2
| | | | | This changes things like whitespace, inconsistent use of #ifndef vs #if !defined(), different macro argument orders, mismatched comments, etc.
* netchild's mega-patch to isolate compiler dependencies into a centraljoerg2005-03-021-6/+10
| | | | | | | | | | | | | | | | place. This moves the dependency on GCC's and other compiler's features into the central sys/cdefs.h file, while the individual source files can then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42. By now, GCC and ICC (the Intel compiler) have been actively tested on IA32 platforms by netchild. Extension to other compilers is supposed to be possible, of course. Submitted by: netchild Reviewed by: various developers on arch@, some time ago
* MFia64:ps2004-07-301-17/+1
| | | | | Fix -O builds with gcc 3.4 by defining ffs as __builtin_ffs instead of creating an inline function that just calls __builtin_ffs.
* MFi386: move rss() from db_interface.c to cpufunc.hpeter2004-04-071-0/+8
|
* Remove advertising clause from University of California Regent's license,imp2004-04-051-4/+0
| | | | | | per letter dated July 22, 1999 and email from Peter Wemm. Approved by: core, peter
* Don't implement anything in the ffs family in <machine/cpufunc.h>bde2004-03-111-0/+4
| | | | | | | | | | | | | | | in the non-_KERNEL case. This "fixes" applications that include this "kernel-only" header and also include <strings.h> (or get <strings.h> via the default _BSD_VISIBLE pollution in <string.h>. In C++ there was a fatal error: the declaration specifies C linkage but the implementation gives C++ linkage. In C there was only a static/extern mismatch if the headers were included in a certain order order, and a partially redundant declaration for all include orders; gcc emits incomplete or wrong diagnostics for these, but only for compiling with -Wsystem-headers and certain other warning options, so the problem was usually not seen for C. Ports breakage reported by: kris
* MFi386: re-sort non-gcc function prototypes, trim includespeter2004-03-081-44/+30
|
* Fix syntax errors and wrong function prototypes in several MD headerle2004-03-051-3/+3
| | | | | | | | files when using non-GNUC compilers. PR: kern/58515 Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at> Approved by: grog (mentor), obrien
* Re-add debug register functionspeter2004-01-281-2/+129
|
* Add 64 bit bsf*/ffs* routines. Have the ffs() inline use gcc's builtinpeter2003-12-061-1/+40
| | | | because it uses the better cmove instructions to avoid branches.
* Update the graffiti.peter2003-11-081-0/+1
|
* Collect the nastiness for preserving the kernel MSR_GSBASE around thepeter2003-05-151-0/+36
| | | | | | | | | | load_gs() calls into a single place that is less likely to go wrong. Eliminate the per-process context switching of MSR_GSBASE, because it should be constant for a single cpu. Instead, save/restore it during the loading of the new %gs selector for the new process. Approved by: re (amd64/* blanket)
* Add BASIC i386 binary support for the amd64 kernel. This is largelypeter2003-05-141-1/+12
| | | | | | | | | | | | | | | | | | | | | | stolen from the ia64/ia32 code (indeed there was a repocopy), but I've redone the MD parts and added and fixed a few essential syscalls. It is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic) and p4. The ia64 code has not implemented signal delivery, so I had to do that. Before you say it, yes, this does need to go in a common place. But we're in a freeze at the moment and I didn't want to risk breaking ia64. I will sort this out after the freeze so that the common code is in a common place. On the AMD64 side, this required adding segment selector context switch support and some other support infrastructure. The %fs/%gs etc code is hairy because loading %gs will clobber the kernel's current MSR_GSBASE setting. The segment selectors are not used by the kernel, so they're only changed at context switch time or when changing modes. This still needs to be optimized. Approved by: re (amd64/* blanket)
* Commit MD parts of a loosely functional AMD64 port. This is based onpeter2003-05-011-173/+53
| | | | | | | | | | | | | | | | | | | | | | a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to attempt to get a stable base to start from. There is a lot missing still. Worth noting: - The kernel runs at 1GB in order to cheat with the pmap code. pmap uses a variation of the PAE code in order to avoid having to worry about 4 levels of page tables yet. - It boots in 64 bit "long mode" with a tiny trampoline embedded in the i386 loader. This simplifies locore.s greatly. - There are still quite a few fragments of i386-specific code that have not been translated yet, and some that I cheated and wrote dumb C versions of (bcopy etc). - It has both int 0x80 for syscalls (but using registers for argument passing, as is native on the amd64 ABI), and the 'syscall' instruction for syscalls. int 0x80 preserves all registers, 'syscall' does not. - I have tried to minimize looking at the NetBSD code, except in a couple of places (eg: to find which register they use to replace the trashed %rcx register in the syscall instruction). As a result, there is not a lot of similarity. I did look at NetBSD a few times while debugging to get some ideas about what I might have done wrong in my first attempt.
* Backout my last commit.davidxu2003-04-201-4/+4
| | | | Requested by: bde
* Don't return garbage in high 16 bits.davidxu2003-04-191-4/+4
|
* Create inlines for ltr(sel), lldt(sel), lidt(addr) rather thanpeter2002-09-221-1/+26
| | | | functions that have one instruction.
* Provide in inline function for the (GNUC) assembler "hlt" instruction.markm2002-09-211-0/+7
|
* Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of statpeter2002-07-211-7/+0
| | | | gathering is not an x86 cpu feature)
* Cast to prevent "signed/unsigned comparison" warnings.markm2002-07-151-2/+2
|
* Revive backed out pmap related changes from Feb 2002. The highlights are:peter2002-07-121-75/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - It actually works this time, honest! - Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive, so try and optimize things where possible. - Introduce ranged shootdowns that can be done as a single IPI. - PG_G support for i386 - Specific-cpu targeted shootdowns. For example, there is no sense in globally purging the TLB cache for where we are stealing a page from the local unshared process on the local cpu. Use pm_active to track this. - Add some instrumentation for the tlb shootdown code. - Rip out SMP code from <machine/cpufunc.h> - Try and fix some very bogus PG_G and PG_PS interactions that were bad enough to cause vm86 bios calls to break. vm86 depended on our existing bugs and this was the cause of the VESA panics last time. - Fix the silly one-line error that caused the 'panic: bad pte' last time. - Fix a couple of other silly one-line errors that should have caused more pain than they did. Some more work is needed: - pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we have a hook in cpu_switch. - The IPI handlers need some cleanup. I have a bogus %ds load that can be avoided. - APTD handling is rather bogus and appears to be a large source of global TLB IPI shootdowns for no really good reason. I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop. I expect to see a bigger difference when there is significant pageout activity or the system otherwise has memory shortages. I have backed out a few optimizations that I had been using over the last few days in order to be a little more conservative. I'll revisit these again over the next few days as the dust settles. New option: DISABLE_PG_G - In case I missed something.
* Rename pause() to ia32_pause() so it doesn't conflict with the pause()jhb2002-05-221-2/+2
| | | | | function defined in <unistd.h>. I didn't #ifdef _KERNEL it because the mutex implementation in libpthread will probably need this.
* Debug registers aren't selectors, so use saner names for the variables injhb2002-05-221-24/+24
| | | | the inline functions for reading and writing the debug registers.
* - Sort the pause() inline into the appropriate location.jhb2002-05-221-6/+25
| | | | - Add many missing prototypes to the non-GCC section.
OpenPOWER on IntegriCloud