summaryrefslogtreecommitdiffstats
path: root/sys/amd64/include
Commit message (Collapse)AuthorAgeFilesLines
...
* Export various helper variables describing the layout and size ofjhb2015-11-121-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | certain kernel structures for use by debuggers. This mostly aids in examining cores from a kernel without debug symbols as a debugger can infer these values if debug symbols are available. One set of variables describes the layout of 'struct linker_file' to walk the list of loaded kernel modules. A second set of variables describes the layout of 'struct proc' and 'struct thread' to walk the list of processes in the kernel and the threads in each process. The 'pcb_size' variable is used to index into the stoppcbs[] array. The 'vm_maxuser_address' is used to distinguish kernel virtual addresses from user addresses. This doesn't have to be perfect, and 'vm_maxuser_address' is a cheap and simple way to differentiate kernel pointers from simple values like TIDs and PIDs. While here, annotate the fields in struct pcb used by kgdb on amd64 and i386 to note that their ABI should be preserved. Annotations for other platforms will be added in the future. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3773
* Add CLFLUSHOPT instruction wrappers.kib2015-10-231-0/+7
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* x86/xen: Consolidate xen-os.h in a single placeroyger2015-10-211-130/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | amd64 and i386 platform code contain very similar xen/xen-os.h The only differences are: - Functions/variables/types which were unused in i386/xen/xen-os.h: * xen_xchg * __xchg_dummy * __xg * __xchg * atomic_t * atomic_inc * rdtscll The functions/variables/types unused in xen-os.h can be dropped and there is no more differences betwen amd64 and i386. The new header is placed in x86/include/xen and each platform will have dummy headers include x86/xen/*.h. This is to be able to include machine/xen/*.h in the PV drivers. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3880 Sponsored by: Citrix Systems R&D
* xen/console: Introduce a new console driver for Xen guestroyger2015-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current Xen console driver is crashing very quickly when using it on an ARM guest. This is because the console lock is recursive and it may lead to recursion on the tty lock and/or corrupt the ring pointer. Furthermore, the console lock is not always taken where it should be and has to be released too early because of the way the console has been designed. Over the years, code has been modified to support various new features but the driver has not been reworked. This new driver has been rewritten with the idea of only having a small set of specific function to write either via the shared ring or the hypercall interface. Note that HVM support has been left aside for now because it requires additional features which are not yet supported. A follow-up patch will be sent with HVM guest support. List of items that may be good to have but not mandatory: - Avoid to flush for each character written when using the tty - Support multiple consoles Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3698 Sponsored by: Citrix Systems R&D
* Update Xen headers from 4.2 to 4.6royger2015-10-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull the latest headers for Xen which allow us to add support for ARM and use new features in FreeBSD. This is a verbatim copy of the xen/include/public so every headers which don't exits anymore in the Xen repositories have been dropped. Note the interface version hasn't been bumped, it will be done in a follow-up. Although, it requires fix in the code to get it compiled: - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so drop it. - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on xen/interface/event_channel.h, so include it. - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include machine/intr_machdep.h. This is also fixing build compilation with the new headers. - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas been dropped. So directly use struct blkif_request_segment Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if __XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file where xen/xen-os.h is not correctly included. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3805 Sponsored by: Citrix Systems R&D
* amd64: plug redundant bootAP declarationmjg2015-09-221-1/+0
| | | | Reported by: gcc5
* Merge stack(9) implementations for i386 and amd64 under x86/.markj2015-09-111-39/+3
| | | | | | Reviewed by: jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3255
* Remove some more vestiges of the Xen PV domu support. Specifically,jhb2015-08-062-132/+0
| | | | | | | | | use vtophys() directly instead of vtomach() and retire the no-longer-used headers <machine/xenfunc.h> and <machine/xenvar.h>. Reported by: bde (stale bits in <machine/xenfunc.h>) Reviewed by: royger (earlier version) Differential Revision: https://reviews.freebsd.org/D3266
* Rationalize BSD license on sys/*/include/in_cksum.hemaste2015-08-051-1/+1
| | | | | | | Remove the advertising clause from the Regents of the University of California's license, per the letter dated July 22, 1999. Update clause numbering.
* Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUIDkib2015-08-031-0/+1
| | | | | | | | | | | | | | reported, on APs. We already did this on BSP. Otherwise, the userspace software which depends on the features reported by the high CPUID levels is misbehaving. In particular, AVX detection is non-functional, depending on which CPU thread happens to execute when doing CPUID. Another victim is the libthr signal handlers interposer, which needs to save full FPU extended state. Reported and tested by: Andre Meiser <ortadur@web.de> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Improve comments.kib2015-07-301-4/+4
| | | | | Submitted by: bde MFC after: 2 weeks
* Remove full barrier from the amd64 atomic_load_acq_*(). Strongkib2015-07-281-17/+7
| | | | | | | | | | | | | | | | | ordering semantic of x86 CPUs makes only the compiler barrier neccessary to give the acquire behaviour. Existing implementation ensured sequentially consistent semantic for load_acq, making much stronger guarantee than required by standard's definition of the load acquire. Consumers which depend on the barrier are believed to be identified and already fixed to use proper operations. Noted by: alc (long time ago) Reviewed by: alc, bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Add a comment discussing the appropriate use of the atomic_*() functionsalc2015-07-241-0/+19
| | | | | | | | with acquire and release semantics versus the *mb() functions on amd64 processors. Reviewed by: bde (an earlier version), kib Sponsored by: EMC / Isilon Storage Division
* Add the atomic_thread_fence() family of functions with intent tokib2015-07-081-0/+32
| | | | | | | | | | | | | | | | | | | | | | | provide a semantic defined by the C11 fences with corresponding memory_order. atomic_thread_fence_acq() gives r | r, w, where r and w are read and write accesses, and | denotes the fence itself. atomic_thread_fence_rel() is r, w | w. atomic_thread_fence_acq_rel() is the combination of the acquire and release in single operation. Note that reads after the acq+rel fence could be made visible before writes preceeding the fence. atomic_thread_fence_seq_cst() orders all accesses before/after the fence, and the fence itself is globally ordered against other sequentially consistent atomic operations. Reviewed by: alc Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Use single instance of the identical INKERNEL() and PMC_IN_KERNEL()kib2015-07-023-5/+4
| | | | | | | | | | | | | macros on amd64 and i386. Move the definition to machine/param.h. kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb version to PINKERNEL(). On i386, correct the lowest kernel address. After the shared page was introduced, USRSTACK no longer points to the last user address + 1 [*] Submitted by: Oliver Pinter [*] Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add a comment about too strong semantic of atomic_load_acq() on x86.kib2015-06-291-0/+9
| | | | | Submitted by: bde MFC after: 2 weeks
* pcb_gs32sd is unused for long time, remove it. Keep the padding in pcb.kib2015-06-291-2/+1
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Remove unneeded data dependency, currently imposed bykib2015-06-281-50/+75
| | | | | | | | | | | | | | | | | | | | | | | atomic_load_acq(9), on it source, for x86. Right now, atomic_load_acq() on x86 is sequentially consistent with other atomics, code ensures this by doing store/load barrier by performing locked nop on the source. Provide separate primitive __storeload_barrier(), which is implemented as the locked nop done on a cpu-private variable, and put __storeload_barrier() before load, to keep seq_cst semantic but avoid introducing false dependency on the no-modification of the source for its later use. Note that seq_cst property of x86 atomic_load_acq() is not documented and not carried by atomics implementations on other architectures, although some kernel code relies on the behaviour. This commit does not intend to change this. Reviewed by: alc Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Restructure memory allocation in bhyve to support "devmem".neel2015-06-182-20/+53
| | | | | | | | | | | | | | | | | | | | | devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer where doing a trap-and-emulate for every access is impractical. devmem is a hybrid of system memory (sysmem) and emulated device models. devmem is mapped in the guest address space via nested page tables similar to sysmem. However the address range where devmem is mapped may be changed by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is usually mapped RO or RW as compared to RWX mappings for sysmem. Each devmem segment is named (e.g. "bootrom") and this name is used to create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom). The device node supports mmap(2) and this decouples the host mapping of devmem from its mapping in the guest address space (which can change). Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D2762 MFC after: 4 weeks
* Retire VM_FREEPOOL_CACHE as the next step in eliminating PG_CACHE pages.alc2015-06-081-3/+2
| | | | | | Differential Revision: https://reviews.freebsd.org/D2712 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division
* Update print_INTEL_TLB() by the tag values from the Intel SDMkib2015-06-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | rev. 55. The modern CPUs cache and TLB descriptions looked quite questionable without the update, e.g. Haswell i7 4770S reported: Data TLB: 4 KB pages, 4-way set associative, 64 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line After the update, the report is: Data TLB: 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line Some tags were apparently removed from the table 3-21, Vol. 2A. Keep them around, but add a comment stating the removal. Update the format line for cpu_stdext_feature according to the bits from the SDM rev.55. It appears that Haswells do not store %cs and %ds values in the FPU save area. Store content of the %ecx register from the CPUID leaf 0x7 subleaf 0 as cpu_stdext_feature2 and print defined bits from it, again acording to SDM rev. 55. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix non-deterministic delays when accessing a vcpu that was in "running" orneel2015-05-281-6/+20
| | | | | | | "sleeping" state. This is done by forcing the vcpu to transition to "idle" by returning to userspace with an exit code of VM_EXITCODE_REQIDLE. MFC after: 2 weeks
* Rewrite amd64 PCID implementation to follow an algorithm described inkib2015-05-094-19/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the Vahalia' "Unix Internals" section 15.12 "Other TLB Consistency Algorithms". The same algorithm is already utilized by the MIPS pmap to handle ASIDs. The PCID for the address space is now allocated per-cpu during context switch to the thread using pmap, when no PCID on the cpu was ever allocated, or the current PCID is invalidated. If the PCID is reused, bit 63 of %cr3 can be set to avoid TLB flush. Each cpu has PCID' algorithm generation count, which is saved in the pmap pcpu block when pcpu PCID is allocated. On invalidation, the pmap generation count is zeroed, which signals the context switch code that already allocated PCID is no longer valid. The implication is the TLB shootdown for the given cpu/address space, due to the allocation of new PCID. The pm_save mask is no longer has to be tracked, which (significantly) reduces the targets of the TLB shootdown IPIs. Previously, pm_save was reset only on pmap_invalidate_all(), which made it accumulate the cpuids of all processors on which the thread was scheduled between full TLB shootdowns. Besides reducing the amount of TLB shootdowns and removing atomics to update pm_saves in the context switch code, the algorithm is much simpler than the maintanence of pm_save and selection of the right address space in the shootdown IPI handler. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* If x86 CPU implementation of the MWAIT instruction reasonablykib2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | interacts with interrupts, query ACPI and use MWAIT for entrance into Cx sleep states. Support C1 "I/O then halt" mode. See Intel' document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface Specification" for description. Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use it instead of inlining "sti; hlt" sequence in several places. In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported} sysctls. Both jkim and avg have some other patches implementing the mwait functionality; this work is unrelated. Linux does not rely on the ACPI to provide correct tables describing Cx modes. Instead, the driver has pre-defined knowledge of the CPU models, it was supplied by Intel. Tested by: pho (previous versions) Sponsored by: The FreeBSD Foundation
* Check 'td_owepreempt' and yield the vcpu thread if it is set.neel2015-05-061-1/+7
| | | | | | | | | | This is done explicitly because a vcpu thread can be in a critical section for the entire time slice alloted to it. This in turn can delay the handling of the 'td_owepreempt'. Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D2430
* Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup().neel2015-05-062-9/+12
| | | | | | | | | | | | | | | | | | Prior to this change both functions returned 0 for success, -1 for failure and +1 to indicate that an exception was injected into the guest. The numerical value of ERESTART also happens to be -1 so when these functions returned -1 it had to be translated to a positive errno value to prevent the VM_RUN ioctl from being inadvertently restarted. This made it easy to introduce bugs when writing emulation code. Fix this by adding an 'int *guest_fault' parameter and setting it to '1' if an exception was delivered to the guest. The return value is 0 or EFAULT so no additional translation is needed. Reviewed by: tychon MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D2428
* Don't require <sys/cpuset.h> to be always included before <machine/vmm.h>.neel2015-04-301-2/+4
| | | | | | | Only a subset of source files that include <machine/vmm.h> need to use the APIs that require the inclusion of <sys/cpuset.h>. MFC after: 1 week
* Remove support for Xen PV domU kernels. Support for HVM domU kernelsjhb2015-04-303-297/+0
| | | | | | | | | | | | | | | | | | | | | remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes
* Move common code from sys/i386/i386/mp_machdep.c andkib2015-04-241-0/+39
| | | | | | | | | sys/amd64/amd64/mp_machdep.c, to the new common x86 source sys/x86/x86/mp_x86.c. Proposed and reviewed by: jhb Review: https://reviews.freebsd.org/D2347 Sponsored by: The FreeBSD Foundation
* Reassign copyright statements on several files from Advancedjhb2015-04-231-1/+1
| | | | | | | Computing Technologies LLC to Hudson River Trading LLC. Approved by: Hudson River Trading LLC (who owns ACT LLC) MFC after: 1 week
* Move some common code from sys/amd64/amd64/machdep.c andkib2015-04-221-0/+1
| | | | | | | | sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most of the code is related to the idle handling. Discussed with: pluknet Sponsored by: The FreeBSD Foundation
* Use explicitly sized types in EFI module metadataemaste2015-04-101-5/+5
| | | | | | | This will allow the same metadata struct to be used on all platforms. Differential Revision: https://reviews.freebsd.org/D2275 Reviewed by: jhb
* Fix "MOVS" instruction memory to MMIO emulation. Currently updates totychon2015-04-011-1/+1
| | | | | | | | | %rdi, %rsi, etc are inadvertently bypassed along with the check to see if the instruction needs to be repeated per the 'rep' prefix. Add "MOVS" instruction support for the 'MMIO to MMIO' case. Reviewed by: neel
* When fetching an instruction in non-64bit mode, consider the value of thetychon2015-03-241-0/+1
| | | | | | | | | code segment base address. Also if an instruction doesn't support a mod R/M (modRM) byte, don't be concerned if the CPU is in real mode. Reviewed by: neel
* Use VT-d interrupt remapping block (IR) to perform FSB messageskib2015-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Use lapic_ipi_alloc() to dynamically allocate IPI slots needed by bhyve whenneel2015-03-141-0/+1
| | | | | | | | vmm.ko is loaded. Also relocate the 'justreturn' IPI handler to be alongside all other handlers. Requested by: kib
* Add x2APIC support. Enable it by default if CPU is capable. Thekib2015-02-091-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months
* Generalized parts of the XEN timer code into a generic pvclockbryanv2015-02-041-0/+6
| | | | | | | | | KVM clock shares the same data structures between the guest and the host as Xen so it makes sense to just have a single copy of this code. Differential Revision: https://reviews.freebsd.org/D1429 Reviewed by: royger (eariler version) MFC after: 1 month
* MOVS instruction emulation.neel2015-01-191-1/+5
| | | | | | | | | | | | These instructions are emitted by 'bus_space_read_region()' when accessing MMIO regions. Since MOVS can be used with a repeat prefix start decoding the REPZ and REPNZ prefixes. Also start decoding the segment override prefix since MOVS allows overriding the source operand segment register. Tested by: tychon MFC after: 1 week
* Simplify instruction restart logic in bhyve.neel2015-01-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | Keep track of the next instruction to be executed by the vcpu as 'nextrip'. As a result the VM_RUN ioctl no longer takes the %rip where a vcpu should start execution. Also, instruction restart happens implicitly via 'vm_inject_exception()' or explicitly via 'vm_restart_instruction()'. The APIs behave identically in both kernel and userspace contexts. The main beneficiary is the instruction emulation code that executes in both contexts. bhyve(8) VM exit handlers now treat 'vmexit->rip' and 'vmexit->inst_length' as readonly: - Restarting an instruction is now done by calling 'vm_restart_instruction()' as opposed to setting 'vmexit->inst_length' to 0 (e.g. emulate_inout()) - Resuming vcpu at an arbitrary %rip is now done by setting VM_REG_GUEST_RIP as opposed to changing 'vmexit->rip' (e.g. vmexit_task_switch()) Differential Revision: https://reviews.freebsd.org/D1526 Reviewed by: grehan MFC after: 2 weeks
* loader: implement multiboot support for Xen Dom0royger2015-01-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement a subset of the multiboot specification in order to boot Xen and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot implementation is tailored to boot Xen and FreeBSD Dom0, and it will most surely fail to boot any other multiboot compilant kernel. In order to detect and boot the Xen microkernel, two new file formats are added to the bootloader, multiboot and multiboot_obj. Multiboot support must be tested before regular ELF support, since Xen is a multiboot kernel that also uses ELF. After a multiboot kernel is detected, all the other loaded kernels/modules are parsed by the multiboot_obj format. The layout of the loaded objects in memory is the following; first the Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to long mode by itself), after that the FreeBSD kernel is loaded as a RAW file (Xen will parse and load it using it's internal ELF loader), and finally the metadata and the modules are loaded using the native FreeBSD way. After everything is loaded we jump into Xen's entry point using a small trampoline. The order of the multiboot modules passed to Xen is the following, the first module is the RAW FreeBSD kernel, and the second module is the metadata and the FreeBSD modules. Since Xen will relocate the memory position of the second multiboot module (the one that contains the metadata and native FreeBSD modules), we need to stash the original modulep address inside of the metadata itself in order to recalculate its position once booted. This also means the metadata must come before the loaded modules, so after loading the FreeBSD kernel a portion of memory is reserved in order to place the metadata before booting. In order to tell the loader to boot Xen and then the FreeBSD kernel the following has to be added to the /boot/loader.conf file: xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga" xen_kernel="/boot/xen" The first argument contains the command line that will be passed to the Xen kernel, while the second argument is the path to the Xen kernel itself. This can also be done manually from the loader command line, by for example typing the following set of commands: OK unload OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga OK load kernel OK load zfs OK load if_tap OK load ... OK boot Sponsored by: Citrix Systems R&D Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D517 For the Forth bits: Submitted by: Julien Grall <julien.grall AT citrix.com>
* 'struct vm_exception' was intended to be used only as the collateral for theneel2015-01-132-2/+6
| | | | | | | | | | | | | | | | VM_INJECT_EXCEPTION ioctl. However it morphed into other uses like keeping track pending exceptions for a vcpu. This in turn causes confusion because some fields in 'struct vm_exception' like 'vcpuid' make sense only in the ioctl context. It also makes it harder to add or remove structure fields. Fix this by using 'struct vm_exception' only to communicate information from userspace to vmm.ko when injecting an exception. Also, add a field 'restart_instruction' to 'struct vm_exception'. This field is set to '1' for exceptions where the faulting instruction is restarted after the exception is handled. MFC after: 1 week
* Revert r276600: PHYS_TO_DMAP_RAW() and DMAP_TO_PHYS_RAW() are nokib2015-01-121-4/+2
| | | | | | | | longer used, remove them. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week
* For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of Pagingkib2015-01-121-0/+1
| | | | | | | | | Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Factor out duplicated code from dumpsys() on each architecture into genericmarkj2015-01-071-0/+6
| | | | | | | | | | | | | | code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division
* For /dev/mem and /dev/kmem accesses, avoid asserting that addresseskib2015-01-031-2/+4
| | | | | | | are within direct map. We want to return error instead of panicing. PR: 194995 Sponsored by: The FreeBSD Foundation
* The physical memory allocator supports the use of distinct free lists foralc2014-12-311-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | managing pages from different address ranges. Generally speaking, this feature is used to increase the likelihood that physical pages are available that can meet special DMA requirements or can be accessed through a limited-coverage direct mapping (e.g., MIPS). However, prior to this change, the configuration of the free lists was static, i.e., it was determined at compile time. Consequentally, free lists could be created for address ranges that held no actual pages, for example, on 32-bit MIPS- based systems with 512 MB or less of physical memory. This change makes the creation of the free lists dynamic, i.e., it is based on the available physical memory at boot time. On 64-bit x86-based systems with 64 GB or more of physical memory, create free lists for managing pages with physical addresses below 4 GB. This change is to address reported problems with initializing devices that require the allocation of physical pages below 4 GB on some systems with 128 GB or more of physical memory. PR: 185727 Differential Revision: https://reviews.freebsd.org/D1274 Reviewed by: jhb, kib MFC after: 3 weeks Sponsored by: EMC / Isilon Storage Division
* Replace bhyve's minimal RTC emulation with a fully featured one in vmm.ko.neel2014-12-302-0/+24
| | | | | | | | | | | | | | | | | | | | | The new RTC emulation supports all interrupt modes: periodic, update ended and alarm. It is also capable of maintaining the date/time and NVRAM contents across virtual machine reset. Also, the date/time fields can now be modified by the guest. Since bhyve now emulates both the PIT and the RTC there is no need for "Legacy Replacement Routing" in the HPET so get rid of it. The RTC device state can be inspected via bhyvectl as follows: bhyvectl --vm=vm --get-rtc-time bhyvectl --vm=vm --set-rtc-time=<unix_time_secs> bhyvectl --vm=vm --rtc-nvram-offset=<offset> --get-rtc-nvram bhyvectl --vm=vm --rtc-nvram-offset=<offset> --set-rtc-nvram=<value> Reviewed by: tychon Discussed with: grehan Differential Revision: https://reviews.freebsd.org/D1385 MFC after: 2 weeks
* Allow ktr(4) tracing of all guest exceptions via the tunableneel2014-12-231-0/+2
| | | | | | | | | | | | | | | | "hw.vmm.trace_guest_exceptions". To enable this feature set the tunable to "1" before loading vmm.ko. Tracing the guest exceptions can be useful when debugging guest triple faults. Note that there is a performance impact when exception tracing is enabled since every exception will now trigger a VM-exit. Also, handle machine check exceptions that happen during guest execution by vectoring to the host's machine check handler via "int $18". Discussed with: grehan MFC after: 2 weeks
* Revert r274772: it is not valid on MIPSemaste2014-11-251-1/+1
| | | | Reported by: sbruno
OpenPOWER on IntegriCloud