summaryrefslogtreecommitdiffstats
path: root/sys/amd64/amd64/machdep.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix local privilege escalation in IRET handler. [SA-15:21]delphij2015-08-251-0/+1
| | | | | | | | | | | Fix OpenSSH multiple vulnerabilities. [SA-15:22] Disabled ixgbe(4) flow-director support. [EN-15:14] Fix insufficient check of unsupported pkg(7) signature methods. [EN-15:15] Approved by: so
* MFC r271149:pfg2014-09-131-0/+2
| | | | | | | | | | | | Apply known workarounds for less modern MacBooks. The legacy USB circuit tends to give trouble on older MacBooks. While the original report covered MacBook4, extend the fix preemptively for the newer MacBookPro4 too. PR: 191693 Reviewed by: emaste Approved by: re
* MFC r265014: Report boot method (BIOS/UEFI) via sysctl machdep.bootmethodemaste2014-09-081-3/+10
| | | | | Approved by: re Sponsored by: The FreeBSD Foundation
* MFC automatic vt(4) selection for UEFI bootemaste2014-09-021-0/+8
| | | | | | | | | | | | | | | | | | | r268158: Prefer vt(4) for UEFI boot The UEFI framebuffer driver vt_efifb requires vt(4), so add a mechanism for the startup routine to set the preferred console. This change is ugly because console init happens very early in the boot, making a cleaner interface difficult. This change is intended only to facilitate the sc(4) / vt(4) transition, and can be reverted once vt(4) is the default. r268160: Fix typos in VTY constant names from r268158 r268982: Don't pass null kmdp to preload_search_info On Xen PVH guests kmdp == NULL. Sponsored by: The FreeBSD Foundation
* MFC r263822: amd64: Parse the EFI memory map if presentemaste2014-08-221-3/+104
| | | | | | | | With this change (and loader.efi from [HEAD]) we can now boot under qemu using the OVMF UEFI firmware image with the limitation that a serial console is required. Sponsored by: The FreeBSD Foundation
* MFC r258436: Refactor amd64 startup SMAP parsingemaste2014-08-011-33/+44
| | | | | | Extracted from the projects/uefi branch, this change is a reasonable cleanup and will reduce the diffs to review when bringing in the UEFI work.
* MFC r258471: Don't abort SMAP processing after an entry of length 0emaste2014-07-241-1/+1
| | | | | | | Length 0 is not special and should just be skipped. This is the same behaviour as i386. Sponsored by: The FreeBSD Foundation
* MFC r268471:kib2014-07-161-1/+3
| | | | | For safety, ensure that any consumer of the set_regs() and ptrace_set_pc() use the correct return to userspace using iret.
* MFC 259782:jhb2014-01-291-0/+2
| | | | | | Add a resume hook for bhyve that runs a function on all CPUs during resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode at the time of suspend.
* MFC 258176:royger2013-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix accounting for hw.realmem on the i386 and amd64 platforms. sys/i386/i386/machdep.c: sys/amd64/amd64/machdep.c: The value reported by FreeBSD as "real memory" when booting doesn't match what is later reported by sysctl as hw.realmem. This is due to the fact that the value printed during the boot process is fetched from smbios data (when possible), and accounts for holes in physical memory. On the other hand, the value of hw.realmem is unconditionally set to be one larger than the highest page of the physical address space. Fix this by setting hw.realmem to the same value printed during boot, this makes hw.realmem honour it's name and account properly for physical memory present in the system. Submitted by: Roger Pau Monné Reviewed by: gibbs Approved by: gibbs (mentor) Approved by: re (gjb)
* MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)emaste2013-11-251-11/+1
| | | | | | | | | | | | | | | Debuggers may need to change PSL_RF. Note that tf_eflags is already stored in the signal context during signal handling and PSL_RF previously could be modified via sigreturn, so this change should not provide any new ability to userspace. For background see the thread at: http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html Reviewed by: jhb, kib Sponsored by: DARPA, AFRL Approved by: re (gjb)
* Merge projects/bhyve_npt_pmap into head.neel2013-10-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the amd64/pmap code aware of nested page table mappings used by bhyve guests. This allows bhyve to associate each guest with its own vmspace and deal with nested page faults in the context of that vmspace. This also enables features like accessed/dirty bit tracking, swapping to disk and transparent superpage promotions of guest memory. Guest vmspace: Each bhyve guest has a unique vmspace to represent the physical memory allocated to the guest. Each memory segment allocated by the guest is mapped into the guest's address space via the 'vmspace->vm_map' and is backed by an object of type OBJT_DEFAULT. pmap types: The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT. The PT_X86 pmap type is used by the vmspace associated with the host kernel as well as user processes executing on the host. The PT_EPT pmap is used by the vmspace associated with a bhyve guest. Page Table Entries: The EPT page table entries as mostly similar in functionality to regular page table entries although there are some differences in terms of what bits are used to express that functionality. For e.g. the dirty bit is represented by bit 9 in the nested PTE as opposed to bit 6 in the regular x86 PTE. Therefore the bitmask representing the dirty bit is now computed at runtime based on the type of the pmap. Thus PG_M that was previously a macro now becomes a local variable that is initialized at runtime using 'pmap_modified_bit(pmap)'. An additional wrinkle associated with EPT mappings is that older Intel processors don't have hardware support for tracking accessed/dirty bits in the PTE. This means that the amd64/pmap code needs to emulate these bits to provide proper accounting to the VM subsystem. This is achieved by using the following mapping for EPT entries that need emulation of A/D bits: Bit Position Interpreted By PG_V 52 software (accessed bit emulation handler) PG_RW 53 software (dirty bit emulation handler) PG_A 0 hardware (aka EPT_PG_RD) PG_M 1 hardware (aka EPT_PG_WR) The idea to use the mapping listed above for A/D bit emulation came from Alan Cox (alc@). The final difference with respect to x86 PTEs is that some EPT implementations do not support superpage mappings. This is recorded in the 'pm_flags' field of the pmap. TLB invalidation: The amd64/pmap code has a number of ways to do invalidation of mappings that may be cached in the TLB: single page, multiple pages in a range or the entire TLB. All of these funnel into a single EPT invalidation routine called 'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and sends an IPI to the host cpus that are executing the guest's vcpus. On a subsequent entry into the guest it will detect that the EPT has changed and invalidate the mappings from the TLB. Guest memory access: Since the guest memory is no longer wired we need to hold the host physical page that backs the guest physical page before we can access it. The helper functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose. PCI passthru: Guest's with PCI passthru devices will wire the entire guest physical address space. The MMIO BAR associated with the passthru device is backed by a vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that have one or more PCI passthru devices attached to them. Limitations: There isn't a way to map a guest physical page without execute permissions. This is because the amd64/pmap code interprets the guest physical mappings as user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U shares the same bit position as EPT_PG_EXECUTE all guest mappings become automatically executable. Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews as well as their support and encouragement. Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing object for pci passthru mmio regions. Special thanks to Peter Holm for testing the patch on short notice. Approved by: re Discussed with: grehan Reviewed by: alc, kib Tested by: pho
* Implement support for the process-context identifiers ('PCID') onkib2013-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Intel CPUs. The feature tags TLB entries with the Id of the address space and allows to avoid TLB invalidation on the context switch, it is available only in the long mode. In the microbenchmarks, using the PCID decreased latency of the context switches by ~30% on SandyBridge class desktop CPUs, measured with the lat_ctx program from lmbench. If available, use INVPCID instruction when a TLB entry in non-current address space needs to be invalidated. The instruction is typically available on the Haswell. If needed, the use of PCID can be turned off with the vm.pmap.pcid_enabled loader tunable set to 0. The state of the feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl vm.pmap.pcid_save_cnt reports the number of context switches which avoided invalidating the TLB; compare with the total number of context switches, available as sysctl vm.stats.sys.v_swtch. Sponsored by: The FreeBSD Foundation Reviewed by: alc Tested by: pho, bf
* Implement vector callback for PVHVM and unify event channel implementationsgibbs2013-08-291-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-structure Xen HVM support so that: - Xen is detected and hypercalls can be performed very early in system startup. - Xen interrupt services are implemented using FreeBSD's native interrupt delivery infrastructure. - the Xen interrupt service implementation is shared between PV and HVM guests. - Xen interrupt handlers can optionally use a filter handler in order to avoid the overhead of dispatch to an interrupt thread. - interrupt load can be distributed among all available CPUs. - the overhead of accessing the emulated local and I/O apics on HVM is removed for event channel port events. - a similar optimization can eventually, and fairly easily, be used to optimize MSI. Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure, and misc Xen cleanups: Sponsored by: Spectra Logic Corporation Unification of PV & HVM interrupt infrastructure, bug fixes, and misc Xen cleanups: Submitted by: Roger Pau Monné Sponsored by: Citrix Systems R&D sys/x86/x86/local_apic.c: sys/amd64/include/apicvar.h: sys/i386/include/apicvar.h: sys/amd64/amd64/apic_vector.S: sys/i386/i386/apic_vector.s: sys/amd64/amd64/machdep.c: sys/i386/i386/machdep.c: sys/i386/xen/exception.s: sys/x86/include/segments.h: Reserve IDT vector 0x93 for the Xen event channel upcall interrupt handler. On Hypervisors that support the direct vector callback feature, we can request that this vector be called directly by an injected HVM interrupt event, instead of a simulated PCI interrupt on the Xen platform PCI device. This avoids all of the overhead of dealing with the emulated I/O APIC and local APIC. It also means that the Hypervisor can inject these events on any CPU, allowing upcalls for different ports to be handled in parallel. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: Map Xen per-vcpu area during AP startup. sys/amd64/include/intr_machdep.h: sys/i386/include/intr_machdep.h: Increase the FreeBSD IRQ vector table to include space for event channel interrupt sources. sys/amd64/include/pcpu.h: sys/i386/include/pcpu.h: Remove Xen HVM per-cpu variable data. These fields are now allocated via the dynamic per-cpu scheme. See xen_intr.c for details. sys/amd64/include/xen/hypercall.h: sys/dev/xen/blkback/blkback.c: sys/i386/include/xen/xenvar.h: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/xen/gnttab.c: Prefer FreeBSD primatives to Linux ones in Xen support code. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: sys/dev/xen/balloon/balloon.c: sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/console/xencons_ring.c: sys/dev/xen/control/control.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/dev/xen/xenpci/xenpci.c: sys/i386/i386/machdep.c: sys/i386/include/pmap.h: sys/i386/include/xen/xenfunc.h: sys/i386/isa/npx.c: sys/i386/xen/clock.c: sys/i386/xen/mp_machdep.c: sys/i386/xen/mptable.c: sys/i386/xen/xen_clock_util.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/xen_rtc.c: sys/xen/evtchn/evtchn_dev.c: sys/xen/features.c: sys/xen/gnttab.c: sys/xen/gnttab.h: sys/xen/hvm.h: sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbus_if.m: sys/xen/xenbus/xenbusb_front.c: sys/xen/xenbus/xenbusvar.h: sys/xen/xenstore/xenstore.c: sys/xen/xenstore/xenstore_dev.c: sys/xen/xenstore/xenstorevar.h: Pull common Xen OS support functions/settings into xen/xen-os.h. sys/amd64/include/xen/xen-os.h: sys/i386/include/xen/xen-os.h: sys/xen/xen-os.h: Remove constants, macros, and functions unused in FreeBSD's Xen support. sys/xen/xen-os.h: sys/i386/xen/xen_machdep.c: sys/x86/xen/hvm.c: Introduce new functions xen_domain(), xen_pv_domain(), and xen_hvm_domain(). These are used in favor of #ifdefs so that FreeBSD can dynamically detect and adapt to the presence of a hypervisor. The goal is to have an HVM optimized GENERIC, but more is necessary before this is possible. sys/amd64/amd64/machdep.c: sys/dev/xen/xenpci/xenpcivar.h: sys/dev/xen/xenpci/xenpci.c: sys/x86/xen/hvm.c: sys/sys/kernel.h: Refactor magic ioport, Hypercall table and Hypervisor shared information page setup, and move it to a dedicated HVM support module. HVM mode initialization is now triggered during the SI_SUB_HYPERVISOR phase of system startup. This currently occurs just after the kernel VM is fully setup which is just enough infrastructure to allow the hypercall table and shared info page to be properly mapped. sys/xen/hvm.h: sys/x86/xen/hvm.c: Add definitions and a method for configuring Hypervisor event delievery via a direct vector callback. sys/amd64/include/xen/xen-os.h: sys/x86/xen/hvm.c: sys/conf/files: sys/conf/files.amd64: sys/conf/files.i386: Adjust kernel build to reflect the refactoring of early Xen startup code and Xen interrupt services. sys/dev/xen/blkback/blkback.c: sys/dev/xen/blkfront/blkfront.c: sys/dev/xen/blkfront/block.h: sys/dev/xen/control/control.c: sys/dev/xen/evtchn/evtchn_dev.c: sys/dev/xen/netback/netback.c: sys/dev/xen/netfront/netfront.c: sys/xen/xenstore/xenstore.c: sys/xen/evtchn/evtchn_dev.c: sys/dev/xen/console/console.c: sys/dev/xen/console/xencons_ring.c Adjust drivers to use new xen_intr_*() API. sys/dev/xen/blkback/blkback.c: Since blkback defers all event handling to a taskqueue, convert this task queue to a "fast" taskqueue, and schedule it via an interrupt filter. This avoids an unnecessary ithread context switch. sys/xen/xenstore/xenstore.c: The xenstore driver is MPSAFE. Indicate as much when registering its interrupt handler. sys/xen/xenbus/xenbus.c: sys/xen/xenbus/xenbusvar.h: Remove unused event channel APIs. sys/xen/evtchn.h: Remove all kernel Xen interrupt service API definitions from this file. It is now only used for structure and ioctl definitions related to the event channel userland device driver. Update the definitions in this file to match those from NetBSD. Implementing this interface will be necessary for Dom0 support. sys/xen/evtchn/evtchnvar.h: Add a header file for implemenation internal APIs related to managing event channels event delivery. This is used to allow, for example, the event channel userland device driver to access low-level routines that typical kernel consumers of event channel services should never access. sys/xen/interface/event_channel.h: sys/xen/xen_intr.h: Standardize on the evtchn_port_t type for referring to an event channel port id. In order to prevent low-level event channel APIs from leaking to kernel consumers who should not have access to this data, the type is defined twice: Once in the Xen provided event_channel.h, and again in xen/xen_intr.h. The double declaration is protected by __XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared twice within a given compilation unit. sys/xen/xen_intr.h: sys/xen/evtchn/evtchn.c: sys/x86/xen/xen_intr.c: sys/dev/xen/xenpci/evtchn.c: sys/dev/xen/xenpci/xenpcivar.h: New implementation of Xen interrupt services. This is similar in many respects to the i386 PV implementation with the exception that events for bound to event channel ports (i.e. not IPI, virtual IRQ, or physical IRQ) are further optimized to avoid mask/unmask operations that aren't necessary for these edge triggered events. Stubs exist for supporting physical IRQ binding, but will need additional work before this implementation can be fully shared between PV and HVM. sys/amd64/amd64/mp_machdep.c: sys/i386/i386/mp_machdep.c: sys/i386/xen/mp_machdep.c sys/x86/xen/hvm.c: Add support for placing vcpu_info into an arbritary memory page instead of using HYPERVISOR_shared_info->vcpu_info. This allows the creation of domains with more than 32 vcpus. sys/i386/i386/machdep.c: sys/i386/xen/clock.c: sys/i386/xen/xen_machdep.c: sys/i386/xen/exception.s: Add support for new event channle implementation.
* MFi386: add ddb "show sysregs" command.kib2013-07-151-0/+30
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Use slightly more idiomatic expression to get the address of array.kib2013-05-271-1/+1
| | | | | | Tested by: dim, pgj Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Add basic support for FDT to i386 & amd64. This change includes:marcel2013-05-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | 1. Common headers for fdt.h and ofw_machdep.h under x86/include with indirections under i386/include and amd64/include. 2. New modinfo for loader provided FDT blob. 3. Common x86_init_fdt() called from hammer_time() on amd64 and init386() on i386. 4. Split-off FDT specific low-level console functions from FDT bus methods for the uart(4) driver. The low-level console logic has been moved to uart_cpu_fdt.c and is used for arm, mips & powerpc only. The FDT bus methods are shared across all architectures. 5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the fdt_pic_table[] arrays. Both are empty right now. FDT addresses are I/O ports on x86. Since the core FDT code does not handle different address spaces, adding support for both I/O ports and memory addresses requires some thought and discussion. It may be better to use a compile-time option that controls this. Obtained from: Juniper Networks, Inc.
* Retire write-only PCB_GS32BIT pcb flag on amd64.dchagin2013-05-091-1/+1
|
* MFCattilio2013-03-021-9/+10
|\
| * MFcalloutng:davide2013-02-281-9/+10
| | | | | | | | | | | | | | | | | | | | | | When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
* | Switch vm_object lock to be a rwlock.attilio2013-02-201-0/+1
|/ | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* The 'testing memory' patch gets printed too many timeseadler2012-10-221-2/+0
| | | | Approved by: cperciva (implicit)
* Explain the upcoming delay by printing a message when the kerneleadler2012-10-221-0/+2
| | | | | | | | is about to begin testing memory. Reviewed by: dteske, adri Approved by: cperciva MFC after: 1 week
* Reverts r234074,234105,234564,234723,234989,235231-235232 and part ofattilio2012-10-091-5/+0
| | | | | | | | r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius
* Introduce curpcb magic variable, similar to curthread, which is MDkib2012-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | amd64. It is implemented as __pure2 inline with non-volatile asm read from pcpu, which allows a compiler to cache its results. Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb. Note that __curthread() uses magic value 0 as an offsetof(struct pcpu, pc_curthread). It seems to be done this way due to machine/pcpu.h needs to be processed before sys/pcpu.h, because machine/pcpu.h contributes machine-depended fields to the struct pcpu definition. As result, machine/pcpu.h cannot use struct pcpu yet. The __curpcb() also uses a magic constant instead of offsetof(struct pcpu, pc_curpcb) for the same reason. The constants are now defined as symbols and CTASSERTs are added to ensure that future KBI changes do not break the code. Requested and reviewed by: bde MFC after: 3 weeks
* Partially revert r217515 so that the mem_range_softc variable is alwaysjhb2012-07-091-0/+3
| | | | | | | present on x86 kernels. This fixes the build of kernels that include 'device acpi' but do not include 'device mem'. MFC after: 1 month
* Clean up the intr* MD KPI from the SMP dependency, removing a cause ofattilio2012-04-261-2/+0
| | | | | | | | | | | discrepancy between modules and kernel, but deal with SMP differences within the functions themselves. As an added bonus this also helps in terms of code readability. Requested by: gibbs Reviewed by: jhb, marius MFC after: 1 week
* Fix !SMP build after r234074.marius2012-04-101-0/+2
| | | | Reviewed by: attilio, jhb
* BSP is not added to the mask of valid target CPUs for interruptsattilio2012-04-091-0/+5
| | | | | | | | | | | | | | | in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not called in the !SMP case too. Fix this by: - Adding the BSP as an interrupt target directly in cpu_startup(). - Remove an obsolete optimization where the BSP are skipped in set_apic_interrupt_ids(). Reported by: jh Reviewed by: jhb MFC after: 3 days X-MFC: r233961 Pointy hat to: me
* Some BIOSes are known for corrupting low 64KB between suspend and resume.jkim2012-02-151-2/+13
| | | | | | | Mask off the first 16 pages unless we appear to be running in a VM. This address may be overridden by 'hw.physmem.start' tunable from loader. Note Linux used to have a BIOS quirk table for this issue but it seems they made it default recently.
* Add support for the extended FPU states on amd64, both for nativekib2012-01-211-28/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month
* Default to not performing the early-boot memory tests when we detect wegavin2011-12-311-2/+5
| | | | | | | | | | | | | | | | | | are booting inside a VM. There are three reasons to disable this: o It causes the VM host to believe that all the tested pages or RAM are in use. This in turn may force the host to page out pages of RAM belonging to other VMs, or otherwise cause problems with fair resource sharing on the VM cluster. o It adds significant time to the boot process (around 1 second/Gig in testing) o It is unnecessary - the host should have already verified that the memory is functional etc. Note that this simply changes the default when in a VM - it can still be overridden using the hw.memtest.tests tunable. MFC after: 4 weeks
* Weaken the part of assertions added in the r227394. Only check that thekib2011-11-111-1/+1
| | | | | | process state is stopped. MFC after: 1 week
* Stopped process may legitimately have some threads sleeping and notkib2011-11-091-1/+2
| | | | | | | suspended, if the sleep is uninterruptible. Reported and tested by: pho MFC after: 1 week
* Add some improvements in the idle table callbacks:attilio2011-10-031-7/+36
| | | | | | | | | | | | | - Replace instances of manual assembly instruction "hlt" call with halt() function calling. - In cpu_idle_mwait() avoid races in check to sched_runnable() using the same pattern used in cpu_idle_hlt() with the 'hlt' instruction. - Add comments explaining the logic behind the pattern used in cpu_idle_hlt() and other idle callbacks. In collabouration with: jhb, mav Reviewed by: adri, kib MFC after: 3 weeks
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-2/+2
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* In HEAD when doing no further checkes there is no reason use thebz2011-08-201-3/+2
| | | | | | | | | | temporary variable and check with if as TUNABLE_*_FETCH do not alter values unless successfully found the tunable. Reported by: jhb, bde MFC after: 3 days X-MFC with: r224516 Approved by: re (kib)
* Introduce a tunable to disable the time consuming parts of bootupbz2011-07-301-1/+12
| | | | | | | | | | | | | memtesting, which can easily save seconds to minutes of boot time. The tunable name is kept general to allow reusing the code in alternate frameworks. Requested by: many Discussed on: arch (a while a go) Obtained from: Sandvine Incorporated Reviewed by: sbruno Approved by: re (kib) MFC after: 2 weeks
* remove code for dynamic offlining/onlining of CPUs on x86avg2011-06-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | | The code has definitely been broken for SCHED_ULE, which is a default scheduler. It may have been broken for SCHED_4BSD in more subtle ways, e.g. with manually configured CPU affinities and for interrupt devilery purposes. We still provide a way to disable individual CPUs or all hyperthreading "twin" CPUs before SMP startup. See the UPDATING entry for details. Interaction between building CPU topology and disabling CPUs still remains fuzzy: topology is first built using all availble CPUs and then the disabled CPUs should be "subtracted" from it. That doesn't work well if the resulting topology becomes non-uniform. This work is done in cooperation with Attilio Rao who in addition to reviewing also provided parts of code. PR: kern/145385 Discussed with: gcooper, ambrisko, mdf, sbruno Reviewed by: attilio Tested by: pho, pluknet X-MFC after: never
* Remove wrong comment.dchagin2011-05-111-3/+0
| | | | MFC after: 1 week.
* Reduce errors in effective frequency calculation.jkim2011-04-121-2/+3
|
* Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF andjkim2011-04-121-25/+24
| | | | | | MPERF MSRs are available. It was disabled in r216443. Remove the earlier hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little bit more reliable now.
* Use atomic load & store for TSC frequency. It may be overkill for amd64 butjkim2011-04-071-5/+6
| | | | | | | | | safer for i386 because it can be easily over 4 GHz now. More worse, it can be easily changed by user with 'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly). Note it is intentionally not used in performance critical paths to avoid performance regression (but we should, in theory). Alternatively, we may add "virtual TSC" with lower frequency if maximum frequency overflows 32 bits (and ignore possible incoherency as we do now).
* The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000alc2011-03-281-1/+6
| | | | | | | | | | | | | instead of 0x100000. As a side effect, an amd64 kernel now loads at physical address 0x200000 instead of 0x100000. This is probably for the best because it avoids the use of a 2MB page mapping for the first 1MB of the kernel that also spans the fixed MTRRs. However, getmemsize() still thinks that the kernel loads at 0x100000, and so the physical memory between 0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired constant in getmemsize() by a symbol "kernphys" that is defined by the linker script. In collaboration with: kib
* Mostly revert r219468, as I had misremembered the C standard regardingmdf2011-03-111-1/+1
| | | | | | the size of an extern array. Keep one change from strncpy to strlcpy.
* Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turnsjkim2011-03-111-10/+27
| | | | | off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as a CPU ticker. Note tsc_present does not change by this tunable.
* Use MAXPATHLEN rather than the size of an extern array when copying themdf2011-03-101-1/+1
| | | | | | kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh
* To avoid excessive code duplication create wrapper for fill regsdchagin2011-02-161-0/+6
| | | | | from stack frame. Change the trap() code to use newly created function instead of explicit regs assignment.
* Clear the padding when returning context to the usermode, forkib2011-02-051-0/+5
| | | | | | | | | MI ucontext_t and x86 MD parts. Kernel allocates the structures on the stack, and not clearing reserved fields and paddings causes leakage. Noted and discussed with: bde MFC after: 2 weeks
* Set td_kstack_pages for thread0. This was already being done for mostmdf2011-01-261-6/+8
| | | | | | architectures, but i386 and amd64 were missing it. Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>
OpenPOWER on IntegriCloud