summaryrefslogtreecommitdiffstats
path: root/sys/amd64/include
Commit message (Collapse)AuthorAgeFilesLines
* Add svn properties to the recently merged bhyve source files.neel2013-01-202-2/+2
| | | | | The pre-commit hook will not allow any commits without the svn:keywords property in head.
* Revert changes for x2apic support from projects/bhyve.neel2013-01-061-1/+0
| | | | | | | | | | | During the early days of bhyve it did not support instruction emulation which necessitated the use of x2apic to access the local apic. This is no longer the case and the dependency on x2apic has gone away. The x2apic patches can be considered independently of bhyve and will be merged into head via projects/x2apic. Discussed with: grehan
* There is no need for 'start_emulating()' and 'stop_emulating()' to be definedneel2013-01-041-17/+0
| | | | | | in <machine/cpufunc.h> so remove them from there. Obtained from: NetApp
* Prefer x2apic mode when running inside a virtual machine.neel2012-12-161-0/+1
| | | | | | | | | | | | | | Provide a tunable 'machdep.x2apic_desired' to let the administrator override the default behavior. Provide a read-only sysctl 'machdep.x2apic' to let the administrator know whether the kernel is using x2apic or legacy mmio to access local apic. Tested with Parallels Desktop 8 and bhyve hypervisors. Also tested running on bare metal Intel Xeon E5-2658. Obtained from: NetApp Discussed with: jhb, attilio, avg, grehan
* Cleanup the user-space paging exit handler now that the unified instructionneel2012-11-281-2/+0
| | | | | | emulation is in place. Obtained from: NetApp
* Revamp the x86 instruction emulation in bhyve.neel2012-11-282-0/+116
| | | | | | | | | | | | | | | | | | | On a nested page table fault the hypervisor will: - fetch the instruction using the guest %rip and %cr3 - decode the instruction in 'struct vie' - emulate the instruction in host kernel context for local apic accesses - any other type of mmio access is punted up to user-space (e.g. ioapic) The decoded instruction is passed as collateral to the user-space process that is handling the PAGING exit. The emulation code is fleshed out to include more addressing modes (e.g. SIB) and more types of operands (e.g. imm8). The source code is unified into a single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well as /usr/sbin/bhyve. Reviewed by: grehan Obtained from: NetApp
* IFC @ r242684neel2012-11-118-22/+78
|\
| * Provide the reading and display of the Standard Extended Features,kib2012-11-011-0/+1
| | | | | | | | | | | | | | | | introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers. Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks
| * Add an unified macro to deny ability from the compiler to reorderattilio2012-10-091-2/+2
| | | | | | | | | | | | | | | | | | | | instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks
| * Reverts r234074,234105,234564,234723,234989,235231-235232 and part ofattilio2012-10-091-0/+2
| | | | | | | | | | | | | | | | r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius
| * - Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specificjhb2012-09-281-10/+42
| | | | | | | | | | | | | | | | | | | | | | bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks
| * As discussed on -current, remove the hardcoded default maxswzone.des2012-08-141-8/+0
| | | | | | | | MFC after: 3 weeks
| * Add lfence().kib2012-08-011-0/+7
| | | | | | | | MFC after: 1 week
| * Forcibly shut up clang warning about NULL pointer dereference.kib2012-07-231-0/+7
| | | | | | | | MFC after: 3 weeks
| * Introduce curpcb magic variable, similar to curthread, which is MDkib2012-07-191-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | amd64. It is implemented as __pure2 inline with non-volatile asm read from pcpu, which allows a compiler to cache its results. Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb. Note that __curthread() uses magic value 0 as an offsetof(struct pcpu, pc_curthread). It seems to be done this way due to machine/pcpu.h needs to be processed before sys/pcpu.h, because machine/pcpu.h contributes machine-depended fields to the struct pcpu definition. As result, machine/pcpu.h cannot use struct pcpu yet. The __curpcb() also uses a magic constant instead of offsetof(struct pcpu, pc_curpcb) for the same reason. The constants are now defined as symbols and CTASSERTs are added to ensure that future KBI changes do not break the code. Requested and reviewed by: bde MFC after: 3 weeks
| * On AMD64, provide siginfo.si_code for floating point errors when errorkib2012-07-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | occurs using the SSE math processor. Update comments describing the handling of the exception status bits in coprocessors control words. Remove GET_FPU_CW and GET_FPU_SW macros which were used only once. Prefer to use curpcb to access pcb_save over the longer path of referencing pcb through the thread structure. Based on the submission by: Ed Alley <wea llnl gov> PR: amd64/169927 Reviewed by: bde MFC after: 3 weeks
| * Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usagekib2012-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month
* | Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner.neel2012-10-241-2/+2
| | | | | | | | | | | | Also add a stats counter to count the number of NMIs delivered per vcpu. Obtained from: NetApp
* | Add the guest physical address and r/w/x bits togrehan2012-10-121-0/+2
| | | | | | | | | | | | | | | | the paging exit in preparation for a rework of bhyve MMIO handling. Reviewed by: neel Obtained from: NetApp
* | Provide per-vcpu locks instead of relying on a single big lock.neel2012-10-121-7/+12
| | | | | | | | | | | | | | This also gets rid of all the witness.watch warnings related to calling malloc(M_WAITOK) while holding a mutex. Reviewed by: grehan
* | Fix warnings generated by 'debug.witness.watch' during VM creation andneel2012-10-111-1/+1
| | | | | | | | | | | | destruction for calling malloc() with M_WAITOK while holding a mutex. Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.
* | Change vm_malloc() to map pages in the guest physical address space in 4KBneel2012-10-041-1/+0
| | | | | | | | | | | | | | | | | | chunks. This breaks the assumption that the entire memory segment is contiguously allocated in the host physical address space. This also paves the way to satisfy the 4KB page allocations by requesting free pages from the VM subsystem as opposed to hard-partitioning host memory at boot time.
* | Get rid of assumptions in the hypervisor that the host physical memoryneel2012-10-031-4/+7
| | | | | | | | | | | | | | associated with guest physical memory is contiguous. Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested page tables.
* | Get rid of assumptions in the hypervisor that the host physical memoryneel2012-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | associated with guest physical memory is contiguous. In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether or not the address range had already been allocated. Replace this instead with an explicit API 'vm_gpa_available()' that returns TRUE if a page is available for allocation in guest physical address space.
* | Add ioctls to control the X2APIC capability exposed by the virtual machine toneel2012-09-252-0/+22
| | | | | | | | | | | | | | the guest. At the moment this simply sets the state in the 'vcpu' instance but there is no code that acts upon these settings.
* | Add an explicit exit code 'SPINUP_AP' to tell the controlling process that anneel2012-09-251-0/+5
| | | | | | | | | | | | | | | | | | | | AP needs to be activated by spinning up an execution context for it. The local apic emulation is now completely done in the hypervisor and it will detect writes to the ICR_LO register that try to bring up the AP. In response to such writes it will return to userspace with an exit code of SPINUP_AP. Reviewed by: grehan
* | Stash the 'vm_exit' information in each 'struct vcpu'.neel2012-09-241-2/+2
| | | | | | | | | | There is no functional change at this time but this paves the way for vm exit handler functions to easily modify the exit reason going forward.
* | IFC @ r238370grehan2012-07-1110-43/+104
|\ \ | |/
| * Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>jhb2012-07-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | on x86 and use that to implement stop_emulating() in the fpu/npx code. Reimplement start_emulating() in the non-XEN case by using load_cr0() and rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in the description of these instructions in Volume 2 of the ADM. Reviewed by: kib MFC after: 1 month
| * Now that our assembler supports the xsave family of instructions, use themjhb2012-07-051-0/+19
| | | | | | | | | | | | | | | | natively rather than hand-assembled versions. For xgetbv/xsetbv, add a wrapper API to deal with xcr* registers: rxcr() and load_xcr(). Reviewed by: kib MFC after: 1 month
| * Optimize reserve_pv_entries() using the popcnt instruction.alc2012-06-301-0/+9
| |
| * Implement mechanism to export some kernel timekeeping data tokib2012-06-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
| * Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer tokib2012-06-221-0/+1
| | | | | | | | | | | | timekeeping information. MFC after: 1 week
| * The page flag PGA_WRITEABLE is set and cleared exclusively by the pmapalc2012-06-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks
| * - Remove unused code for CR3 and CR4.jkim2012-06-131-1/+1
| | | | | | | | - Fix few style(9) nits while I am here.
| * - Fix resumectx() prototypes to reflect reality.jkim2012-06-131-2/+2
| | | | | | | | | | - For i386, simply jump to resumectx() with PCB in %ecx. - Fix a style(9) nit while I am here.
| * Share IPI init and startup code of mp_machdep.c with acpi_wakeup.ciwasaki2012-06-121-0/+1
| | | | | | | | as ipi_startup().
| * Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference ofiwasaki2012-06-091-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | suspend/resume procedures are minimized among them. common: - Add global cpuset suspended_cpus to indicate APs are suspended/resumed. - Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used). - Add some variables in acpi_wakecode.S in order to minimize the difference among amd64 and i386. - Disable load_cr3() because now CR3 is restored in resumectx(). amd64: - Add suspend/resume related members (such as MSR) in PCB. - Modify savectx() for above new PCB members. - Merge acpi_switch.S into cpu_switch.S as resumectx(). i386: - Merge(and remove) suspendctx() into savectx() in order to match with amd64 code. Reviewed by: attilio@, acpi@
| * Use plain store for atomic_store_rel on x86, instead of implicitlykib2012-06-021-37/+37
| | | | | | | | | | | | | | | | | | locked xchg instruction. IA32 memory model guarantees that store has release semantic, since stores cannot pass loads or stores. Reviewed by: bde, jhb Tested by: pho MFC after: 2 weeks
| * MFp4 bz_ipv6_fast:bz2012-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in_cksum.h required ip.h to be included for struct ip. To be able to use some general checksum functions like in_addword() in a non-IPv4 context, limit the (also exported to user space) IPv4 specific functions to the times, when the ip.h header is present and IPVERSION is defined (to 4). We should consider more general checksum (updating) functions to also allow easier incremental checksum updates in the L3/4 stack and firewalls, as well as ponder further requirements by certain NIC drivers needing slightly different pseudo values in offloading cases. Thinking in terms of a better "library". Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
| * Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it noalc2012-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | longer uses the active and inactive paging queues. Instead, the pmap now maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses this list to select pv entries for reclamation. Note: The old pmap_collect() tried to avoid reclaiming mappings for pages that have either a hold_count or a busy field that is non-zero. However, this isn't necessary for correctness, and the locking in pmap_collect() was insufficient to guarantee that such mappings weren't reclaimed. The new pmap_pv_reclaim() doesn't even try. Reviewed by: kib MFC after: 6 weeks
| * Revert part of r234723 by re-enabling the SMP protection forattilio2012-05-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intr_bind() on x86. This has been requested by jhb and I strongly disagree with this, but as long as he is the x86 and interrupt subsystem maintainer I will follow his directives. The disagreement cames from what we should really consider as a public KPI. IMHO, if we really need a selection between the kernel functions, we may need an explicit protection like _KERNEL_KPI, which defines which subset of the kernel function might really be considered as part of the KPI (for thirdy part modules) and which not. As long as we don't have this mechanism I just consider any possible function as usable by thirdy part code, thus intr_bind() included. MFC after: 1 week
| * Add a convenience macro for the returns_twice attribute, and apply it todim2012-04-291-1/+1
| | | | | | | | | | | | | | the prototypes of the appropriate functions (getcontext, savectx, setjmp, sigsetjmp and vfork). MFC after: 2 weeks
| * Increase DFLDSIZ from 128 MiB to 32 GiB. On amd64 there's plenty of virtualrmh2012-04-271-1/+1
| | | | | | | | | | | | memory available, so there is no need to be so conservative about it. Reviewed by: arch
| * Clean up the intr* MD KPI from the SMP dependency, removing a cause ofattilio2012-04-261-4/+0
| | | | | | | | | | | | | | | | | | | | | | discrepancy between modules and kernel, but deal with SMP differences within the functions themselves. As an added bonus this also helps in terms of code readability. Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week
* | MSI-x interrupt support for PCI pass-thru devices.grehan2012-04-282-1/+19
| | | | | | | | | | | | | | | | | | | | Includes instruction emulation for memory r/w access. This opens the door for io-apic, local apic, hpet timer, and legacy device emulation. Submitted by: ryan dot berryhill at sandvine dot com Reviewed by: grehan Obtained from: Sandvine
* | IFC @ r234692grehan2012-04-2628-2031/+129
|\ \ | |/ | | | | | | | | | | | | | | | | | | sys/amd64/include/cpufunc.h sys/amd64/include/fpu.h sys/amd64/amd64/fpu.c sys/amd64/vmm/vmm.c - Add API to allow vmm FPU state init/save/restore. FP stuff discussed with: kib
| * bump INTRCNT_COUNT values to reflect actual numbers of IPI countersavg2012-04-131-2/+2
| | | | | | | | | | | | | | Maybe the numbers should be conditionalized on COUNT_IPIS Reviewed by: jhb MFC after: 1 week
| * Move the legacy(4) driver to x86.jhb2012-03-301-63/+0
| |
| * Use a more proper fix for enabling HT MSI mapping windows on Host-PCIjhb2012-03-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bridges. Rather than blindly enabling the windows on all of them, only enable the window when an MSI interrupt is enabled for a device behind the bridge, similar to what already happens for HT PCI-PCI bridges. To implement this, each x86 Host-PCI bridge driver has to be able to locate it's actual backing device on bus 0. For ACPI, use the _ADR method to find the slot and function of the device. For the non-ACPI case, the legacy(4) driver already scans bus 0 looking for Host-PCI bridge devices. Now it saves the slot and function of each bridge that it finds as ivars that the Host-PCI bridge driver can then use in its pcib_map_msi() method. This fixes machines where non-MSI interrupts were broken by the previous round of HT MSI changes. Tested by: bapt MFC after: 1 week
OpenPOWER on IntegriCloud