summaryrefslogtreecommitdiffstats
path: root/sys/amd64/include
Commit message (Collapse)AuthorAgeFilesLines
* MFC 258860,260167,260238,260397:jhb2014-04-171-2/+11
| | | | | | | | | | | | | | | | - Restructure the VMX code to enter and exit the guest. In large part this change hides the setjmp/longjmp semantics of VM enter/exit. vmx_enter_guest() is used to enter guest context and vmx_exit_guest() is used to transition back into host context. Fix a longstanding race where a vcpu interrupt notification might be ignored if it happens after vmx_inject_interrupts() but before host interrupts are disabled in vmx_resume/vmx_launch. We now call vmx_inject_interrupts() with host interrupts disabled to prevent this. - The 'protection' field in the VM exit collateral for the PAGING exit is not used - get rid of it. Reviewed by: grehan
* MFC r263001royger2014-04-041-0/+6
| | | | | | | | Move asm IPIs handlers to C code, so both Xen and native IPI handlers share the same code. Approved by: gibbs Sponsored by: Citrix Systems R&D
* MFC r263475:kib2014-03-281-0/+1
| | | | | | | | | | | | | | | | Fix two issues with /dev/mem access on amd64, both causing kernel page faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, use a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. MFC r263498: Add change forgotten in r263475. Make dmaplimit accessible outside amd64/pmap.c.
* MFC: r262746, r262748, r262750, r262752jkim2014-03-102-1/+1
| | | | Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.
* MFC 258859,259081,259085,259205,259213,259275,259482,259537,259702,259779:jhb2014-02-232-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several changes to the local APIC support in bhyve: - Rename 'vm_interrupt_hostcpu()' to 'vcpu_notify_event()'. - If a vcpu disables its local apic and then executes a 'HLT' then spin down the vcpu and destroy its thread context. Also modify the 'HLT' processing to ignore pending interrupts in the IRR if interrupts have been disabled by the guest. The interrupt cannot be injected into the guest in any case so resuming it is futile. - Use callout(9) to drive the vlapic timer instead of clocking it on each VM exit. - When the guest is bringing up the APs in the x2APIC mode a write to the ICR register will now trigger a return to userspace with an exitcode of VM_EXITCODE_SPINUP_AP. - Change the vlapic timer lock to be a spinlock because the vlapic can be accessed from within a critical section (vm run loop) when guest is using x2apic mode. - Fix the vlapic version register. - Add a command to bhyvectl to inject an NMI on a specific vcpu. - Add an API to deliver message signalled interrupts to vcpus. This allows callers to treat the MSI 'addr' and 'data' fields as opaque and also lets bhyve implement multiple destination modes: physical, flat and clustered. - Rename the ambiguously named 'vm_setup_msi()' and 'vm_setup_msix()' to 'vm_setup_pptdev_msi()' and 'vm_setup_pptdev_msix()' respectively. - Consolidate the virtual apic initialization in a single function: vlapic_reset() - Add a generic routine to trigger an LVT interrupt that supports both fixed and NMI delivery modes. - Add an ioctl and bhyvectl command to trigger local interrupts inside a guest. In particular, a global NMI similar to that raised by SERR# or PERR# can be simulated by asserting LINT1 on all vCPUs. - Extend the LVT table in the vCPU local APIC to support CMCI. - Flesh out the local APIC error reporting a bit to cache errors and report them via ESR when ESR is written to. Add support for asserting the error LVT when an error occurs. Raise illegal vector errors when attempting to signal an invalid vector for an interrupt or when sending an IPI. - Export table entries in the MADT and MP Table advertising the stock x86 config of LINT0 set to ExtInt and LINT1 wired to NMI.
* MFC 261517,261520:jhb2014-02-181-3/+0
| | | | | Convert the license on files where I am the sole copyright holder to 2 clause BSD licenses.
* MFC 259140:jhb2014-02-181-9/+0
| | | | | Move constants for indices in the local APIC's local vector table from apicvar.h to apicreg.h.
* MFC 259782:jhb2014-01-292-0/+5
| | | | | | Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend.
* MFC 257422,257661,258075,258476,258494,258579,258609,258699:jhb2014-01-232-0/+25
| | | | | | | | | | Several enhancements to the I/O APIC support in bhyve including: - Move the I/O APIC device model from userspace into vmm.ko and add ioctls to assert and deassert I/O APIC pins. - Add HPET device emulation including a single timer block with 8 timers. - Remove the 'vdev' abstraction. Approved by: neel
* MFC r258660:kib2013-12-032-2/+4
| | | | | | Fix sys/sysctl.h use for cc -m32 on amd64. Approved by: re (gjb)
* MFC r256645.neel2013-10-221-0/+1
| | | | | | | | | | | | | | Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose 'invpcid' instruction to the guest. Currently bhyve will try to enable this capability unconditionally if it is available. Consolidate code in bhyve to set the capabilities so it is no longer duplicated in BSP and AP bringup. Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid' instruction is available. Approved by: re (hrs)
* Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_idgibbs2013-10-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | field. Perform vcpu enumeration for Xen PV and HVM environments and convert all Xen drivers to use vcpu_id instead of a hard coded assumption of the mapping algorithm (acpi or apic ID) in use. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) amd64/include/pcpu.h: i386/include/pcpu.h: Add vcpu_id to the amd64 and i386 pcpu structures. dev/xen/timer/timer.c x86/xen/xen_intr.c Use new vcpu_id instead of assuming acpi_id == vcpu_id. i386/xen/mp_machdep.c: i386/xen/mptable.c x86/xen/hvm.c: Perform Xen HVM and Xen full PV vcpu_id mapping. x86/xen/hvm.c: x86/acpica/madt.c Change SYSINIT ordering of acpi CPU enumeration so that it is guaranteed to be available at the time of Xen HVM vcpu id mapping.
* Merge projects/bhyve_npt_pmap into head.neel2013-10-054-40/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho
* Reorder/regroup the vmm ioctl api definitions to allow somegrehan2013-09-211-21/+35
| | | | | | | | | semblance of API stability and growth during the 10.* timeframe. Userland/kernel bhyve will have to be recompiled after this. Reviewed by: neel Approved by: re@ (blanket)
* Merge Xen PVHVM support into the GENERIC kernel config for bothgibbs2013-09-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | amd64 and i386. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: - Introduce two new CPU hooks for initialization and resume purposes. This allows us to get rid of the XENHVM ifdefs in mp_machdep, and also sets some hooks into common code that can be used by other hypervisor implementations. sys/amd64/conf/XENHVM: sys/i386/conf/XENHVM: - Remove these configs now that GENERIC has builtin support for Xen HVM. sys/kern/subr_smp.c: - Make sure there are no pending IPIs when suspending a system. sys/x86/xen/hvm.c: - Add cpu init and resume vectors that are called from mp_machdep using the new hooks. - Only clear the vcpu_info mapping data on resume. It is already clear for the BSP on a cold boot and is set correctly as APs are started. - Gate xen_hvm_init_cpu only to systems running under Xen. sys/x86/xen/xen_intr.c: - Gate the setup of event channels only to systems running under Xen.
* Merge the following changes from projects/bhyve_npt_pmap:neel2013-09-201-1/+9
| | | | | | | | | | | | - add fields to 'struct pmap' that are required to manage nested page tables. - add a parameter to 'vmspace_alloc()' that can be used to override the default pmap initialization routine 'pmap_pinit()'. These changes are pushed ahead of the remaining changes in 'bhyve_npt_pmap' in anticipation of the upcoming KBI freeze for 10.0. Reviewed by: kib@, alc@ Approved by: re (glebius)
* Add support for suspend/resume/migration operations when running as agibbs2013-09-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen PVHVM guest. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Reviewed by: gibbs Approved by: re (blanket Xen) MFC after: 2 weeks sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: - Make sure that are no MMU related IPIs pending on migration. - Reset pending IPI_BITMAP on resume. - Init vcpu_info on resume. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: sys/x86/acpica/acpi_wakeup.c: sys/x86/x86/intr_machdep.c: sys/x86/isa/atpic.c: sys/x86/x86/io_apic.c: sys/x86/x86/local_apic.c: - Add a "suspend_cancelled" parameter to pic_resume(). For the Xen PIC, restoration of interrupt services differs between the aborted suspend and normal resume cases, so we must provide this information. sys/dev/acpica/acpi_timer.c: sys/dev/xen/timer/timer.c: sys/timetc.h: - Don't swap out "suspend safe" timers across a suspend/resume cycle. This includes the Xen PV and ACPI timers. sys/dev/xen/control/control.c: - Perform proper suspend/resume process for PVHVM: - Suspend all APs before going into suspension, this allows us to reset the vcpu_info on resume for each AP. - Reset shared info page and callback on resume. sys/dev/xen/timer/timer.c: - Implement suspend/resume support for the PV timer. Since FreeBSD doesn't perform a per-cpu resume of the timer, we need to call smp_rendezvous in order to correctly resume the timer on each CPU. sys/dev/xen/xenpci/xenpci.c: - Don't reset the PCI interrupt on each suspend/resume. sys/kern/subr_smp.c: - When suspending a PVHVM domain make sure there are no MMU IPIs in-flight, or we will get a lockup on resume due to the fact that pending event channels are not carried over on migration. - Implement a generic version of restart_cpus that can be used by suspended and stopped cpus. sys/x86/xen/hvm.c: - Implement resume support for the hypercall page and shared info. - Clear vcpu_info so it can be reset by APs when resuming from suspension. sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/x86/xen/xen_intr.c: - Support UP kernel configurations. sys/x86/xen/xen_intr.c: - Properly rebind per-cpus VIRQs and IPIs on resume.
* Go way past 11 and bump bhyve's max vCPUs to 16.grehan2013-09-101-1/+1
| | | | | | | | | | | | | This should be sufficient for 10.0 and will do until forthcoming work to avoid limitations in this area is complete. Thanks to Bela Lubkin at tidalscale for the headsup on the apic/cpu id/io apic ASL parameters that are actually hex values and broke when written as decimal when 11 vCPUs were configured. Approved by: re@
* Implement PV IPIs for PVHVM guests and further converge PV and HVMgibbs2013-09-063-9/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPI implmementations. Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D Submitted by: gibbs (misc cleanup, table driven config) Reviewed by: gibbs MFC after: 2 weeks sys/amd64/include/cpufunc.h: sys/amd64/amd64/pmap.c: Move invltlb_globpcid() into cpufunc.h so that it can be used by the Xen HVM version of tlb shootdown IPI handlers. sys/x86/xen/xen_intr.c: sys/xen/xen_intr.h: Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(), and remove the ipi vector parameter. This api allocates an event channel port that can be used for ipi services, but knows nothing of the actual ipi for which that port will be used. Removing the unused argument and cleaning up the comments surrounding its declaration helps clarify its actual role. sys/amd64/amd64/mp_machdep.c: sys/amd64/include/cpu.h: sys/i386/i386/mp_machdep.c: sys/i386/include/cpu.h: Implement a generic framework for amd64 and i386 that allows the implementation of certain CPU management functions to be selected at runtime. Currently this is only used for the ipi send function, which we optimize for Xen when running on a Xen hypervisor, but can easily be expanded to support more operations. sys/x86/xen/hvm.c: Implement Xen PV IPI handlers and operations, replacing native send IPI. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: sys/i386/include/smp.h: Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS is defined already for us in the xen interface files. NR_IPIS is only needed in one file per Xen platform and is easily inferred by the IPI vector table that is defined in those files. sys/i386/xen/mp_machdep.c: Restructure to more closely match the HVM implementation by performing table driven IPI setup.
* On those machines, where sf_bufs do not represent any real object, makeglebius2013-09-061-0/+12
| | | | | | | | | sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely empty functions. Reviewed by: alc, kib, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
* Tidy up some loose ends in the PCID code:kib2013-09-043-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | - Restore the pre-PCID TLB shootdown handlers for whole address space and single page invalidation asm code, and assign the IPI handler to them when PCID is not supported or disabled. Old handlers have linear control flow. But, still use the common return sequence. - Stop using pcpu for INVPCID descriptors in the invlrg handler. It is enough to allocate descriptors on the stack. As result, two SWAPGS instructions are shaved off from the code for Haswell+. - Fix the reverted condition in invlrng for checking of the PCID support [1], also in invlrng check that pmap is kernel pmap before performing other tests. For the kernel pmap, which provides global mappings, the INVLPG must be used for invalidation always. - Save the pre-computed pmap' %CR3 register in the struct pmap. This allows to remove several checks for pm_pcid validity when %CR3 is reloaded [2]. Noted by: gibbs [1] Discussed with: alc [2] Tested by: pho, flo Sponsored by: The FreeBSD Foundation
* Implement support for the process-context identifiers ('PCID') onkib2013-08-303-6/+13
| | | | | | | | | | | | | | | | | | | | | | | Intel CPUs. The feature tags TLB entries with the Id of the address space and allows to avoid TLB invalidation on the context switch, it is available only in the long mode. In the microbenchmarks, using the PCID decreased latency of the context switches by ~30% on SandyBridge class desktop CPUs, measured with the lat_ctx program from lmbench. If available, use INVPCID instruction when a TLB entry in non-current address space needs to be invalidated. The instruction is typically available on the Haswell. If needed, the use of PCID can be turned off with the vm.pmap.pcid_enabled loader tunable set to 0. The state of the feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl vm.pmap.pcid_save_cnt reports the number of context switches which avoided invalidating the TLB; compare with the total number of context switches, available as sysctl vm.stats.sys.v_swtch. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
* Provide a wrapper for the INVPCID instruction, definition of thekib2013-08-301-0/+20
| | | | | | | | descriptor and symbolic names for the operation types. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
* Implement vector callback for PVHVM and unify event channel implementationsgibbs2013-08-295-210/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.
* Reimplement atomic operations on PDEs and PTEs in pmap.h. This changejkim2013-08-211-35/+8
| | | | | | significantly reduces duplicate code and make it easier to read. Reviewed by: alc, bde
* Remove empty lines before return statements for style consistency.jkim2013-08-211-5/+0
|
* Implement atomic_swap() and atomic_testandset().jkim2013-08-211-15/+57
| | | | Reviewed by: arch, bde, jilles, kib
* - Remove the "a" constraint from main output operand for atomic_cmpset().jkim2013-08-211-10/+10
| | | | - Use "+" modifier for the "expect" because it is also an output (unused).
* Use '+' modifier for a memory operand that is both an input and an output.jkim2013-08-211-23/+16
| | | | | | It was actually done in r86301 but reverted in r150182 because GCC 3.x was not able to handle it for a memory operand. Apparently, this problem was fixed in GCC 4.1+ and several contrib sources already rely on this feature.
* Remove bogus labels. No functional change.jkim2013-08-211-2/+0
|
* Use consistent style. No functional change.jkim2013-08-211-12/+12
|
* Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB.neel2013-08-172-13/+35
| | | | | | | | | | | | | | | | | | | | | | | | Bump up the KVA size proportionally from 512GB to 2TB. The number of page table pages used by the direct map is now calculated at run time based on 'Maxmem'. This means the small memory systems will not see any additional tax in terms of page table pages for the direct map. However all amd64 systems, regardless of the memory size, will use 3 more pages to accomodate the bump in the KVA size. More details available here: http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html Tested with the following configurations: - Sandybridge server with 64GB of memory. - bhyve VM with 64MB of memory. - bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling right up against the 4TB maximum memory limit. Discussed on: hackers@, current@ Submitted by: Chris Torek (torek@torek.net)
* libc: Access _logname_valid more efficiently.jilles2013-08-171-1/+0
| | | | | | | | | | | The variable _logname_valid is not exported via the version script; therefore, change C and i386/amd64 assembler code to remove indirection (which allowed interposition). This makes the code slightly smaller and faster. Also, remove #define PIC_GOT from i386/amd64 in !PIC mode. Without PIC, there is no place containing the address of each variable, so there is no possible definition for PIC_GOT.
* Merge acpica_machdep.h for amd64 and i386 and move to x86. In fact, thesejkim2013-08-131-77/+3
| | | | two files were functionally identical.
* Tidy up global locks for ACPICA. There is no functional change.jkim2013-08-131-3/+3
|
* Change the pmap_ts_referenced() method of amd64 pmap to use sharedkib2013-08-071-0/+1
| | | | | | | | | | | | | | | | | | pvh_global_lock. This allows the method to be executed in parallel, avoiding undue contention on the pvh_global_lock for the multithreaded pagedaemon. The pmap_ts_referenced() function has to inspect the page mappings for several pmaps, which need to be locked while pv list lock is owned. This contradicts to the lock order, where pmap lock is before pv list lock. Introduce the generation count for the pv list of the page or superpage, which indicate any change in the pv list, and, as usual, perform restart of the iteration if generation changed while pv lock was dropped for blocking acquire of a pmap lock. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation
* Revert r253748,253749avg2013-07-281-2/+1
| | | | | | This WIP should not have been committed yet. Pointyhat to: avg
* put contents of cpu.h under _KERNELavg2013-07-281-1/+2
| | | | | | no userland-serviceable parts inside MFC after: 20 days
* x86: detect mwait capabilities and extensions, when presentavg2013-07-281-0/+3
| | | | | Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks
* Fix issues with zeroing and fetching the counters, on x86 and ppc64.kib2013-07-011-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation
* Move definitions required by userland applications out of acpica_machdep.h.jkim2013-06-271-6/+2
|
* Allow immediate operand.kib2013-06-201-1/+1
| | | | Sponsored by: The FreeBSD Foundation
* Upgrade Xen interface headers to Xen 4.2.1.gibbs2013-06-141-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move FreeBSD from interface version 0x00030204 to 0x00030208. Updates are required to our grant table implementation before we can bump this further. sys/xen/hvm.h: Replace the implementation of hvm_get_parameter(), formerly located in sys/xen/interface/hvm/params.h. Linux has a similar file which primarily stores this function. sys/xen/xenstore/xenstore.c: Include new xen/hvm.h header file to get hvm_get_parameter(). sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: Correctly protect function definition and variables from being included into assembly files in xen-os.h Xen memory barriers are now prefixed with "xen_" to avoid conflicts with OS native primatives. Define Xen memory barriers in terms of the native FreeBSD primatives. Sponsored by: Spectra Logic Corporation Reviewed by: Roger Pau Monné Tested by: Roger Pau Monné Obtained from: Roger Pau Monné (bug fixes)
* Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT.neel2013-06-141-5/+0
| | | | Reviewed by: alc
* Fix the hardware watchpoints on SMP amd64. Load the updated %drkib2013-05-212-1/+7
| | | | | | | | | | | registers also on other CPUs, besides the CPU which happens to execute the ddb. The debugging registers are stored in the pcpu area, together with the command which is executed by the IPI stop handler upon resume. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add basic support for FDT to i386 & amd64. This change includes:marcel2013-05-213-0/+13
| | | | | | | | | | | | | | | | | | | | | | 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.
* Retire write-only PCB_GS32BIT pcb flag on amd64.dchagin2013-05-091-1/+0
|
* Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h inattilio2013-05-072-7/+4
| | | | | | | | | order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
* Add RIP-relative addressing to the instruction decoder.grehan2013-04-251-3/+3
| | | | | | | | | | | | Rework the guest register fetch code to allow the RIP to be extracted from the VMCS while the kernel decoder is functioning. Hit by the OpenBSD local-apic code. Submitted by: neel Reviewed by: grehan Obtained from: NetApp
* - Correct spelling in commentsgabor2013-04-171-1/+3
| | | | Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
OpenPOWER on IntegriCloud