summaryrefslogtreecommitdiffstats
path: root/sys/i386
Commit message (Collapse)AuthorAgeFilesLines
* Reduce the number of global TLB shootdowns generated by pmap_qenter().alc2010-07-101-6/+9
| | | | | | | | | | | | | | | | | | Specifically, teach pmap_qenter() to recognize the case when it is being asked to replace a mapping with the very same mapping and not generate a shootdown. Unfortunately, the buffer cache commonly passes an entire buffer to pmap_qenter() when only a subset of the mappings are changing. For the extension of buffers in allocbuf() this was resulting in unnecessary shootdowns. The addition of new pages to the end of the buffer need not and did not trigger a shootdown, but overwriting the initial mappings with the very same mappings was seen as a change that necessitated a shootdown. With this change, that is no longer so. For a "buildworld" on amd64, this change eliminates 14-15% of the pmap_invalidate_range() shootdowns, and about 4% of the overall shootdowns. MFC after: 3 weeks
* Fix spacing.kib2010-07-091-1/+1
| | | | | Noted by: pgollucci MFC after: 3 weeks
* For both i386 and amd64 pmap,kib2010-07-092-2/+2
| | | | | | | | | | | | | - change the type of pm_active to cpumask_t, which it is; - in pmap_remove_pages(), compare with PCPU(curpmap), instead of dereferencing the long chain of pointers [1]. For amd64 pmap, remove the unneeded checks for validity of curpmap in pmap_activate(), since curpmap should be always valid after r209789. Submitted by: alc [1] Reviewed by: alc MFC after: 3 weeks
* Revert r209638. After commit, there appeared to be more people who likedmav2010-07-021-2/+2
| | | | previous name of stray interrupt counters, then responded to the list.
* Make stray irq counters have format alike to other counters. Unified formatmav2010-07-011-2/+2
| | | | makes string processing (for example by `systat -vm`) easier.
* Move prototypes for kern_sigtimedwait() and kern_sigprocmask() tojhb2010-06-301-0/+1
| | | | <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
* Regeneratekib2010-06-284-448/+448
|
* Import the acpi_aibs(4) driver written by Constantine A. Murenin.rpaulo2010-06-251-0/+3
| | | | | | | | | It has more features than acpi_aiboost(4) and it will eventually replace acpi_aiboost(4). Submitted by: Constantine A. Murenin <cnst at FreeBSD.org> Reviewed by: freebsd-acpi, imp MFC after: 1 month
* Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64kib2010-06-233-0/+12
| | | | | | | | | ABI specifies the DF should be zero, and newer compilers do not clear DF before using DF-sensitive instructions. The DF clearing for signal handlers was done some time ago. MFC after: 1 week
* Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() forkib2010-06-231-8/+4
| | | | | | | | | | get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also, note that usercontext is not initialized anymore in fpstate_drop(). Systematically replace references to npxgetregs() and npxsetregs() by npxgetuserregs() and npxsetuserregs() in comments. Noted by: bde
* After the FPU use requires #MF working due to INT13 FPU exception handlingkib2010-06-233-43/+33
| | | | | | | | | removal, MFi386 r209198: Use critical sections instead of disabling local interrupts to ensure the consistency between PCPU fpcurthread and the state of FPU. Reviewed by: bde Tested by: pho
* Remove the support for int13 FPU exception reporting on i386. It iskib2010-06-234-142/+32
| | | | | | | | | believed that all 486-class CPUs FreeBSD is capable to run on, either have no FPU and cannot use external coprocessor, or have FPU on the package and can use #MF. Reviewed by: bde Tested by: pho (previous version)
* Remove unused i586 optimized bcopy/bzero/etc implementations that utilizekib2010-06-234-832/+7
| | | | | | | | | FPU registers for copying. Remove the switch table and jumps from bcopy/bzero/... to the actual implementation. As a side-effect, i486-optimized bzero is removed. Reviewed by: bde Tested by: pho (previous version)
* Some style fixes for r209371.mav2010-06-221-2/+4
| | | | Submitted by: jhb@
* Implement new event timers infrastructure. It provides unified APIs formav2010-06-202-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | writing event timer drivers, for choosing best possible drivers by machine independent code and for operating them to supply kernel with hardclock(), statclock() and profclock() events in unified fashion on various hardware. Infrastructure provides support for both per-CPU (independent for every CPU core) and global timers in periodic and one-shot modes. MI management code at this moment uses only periodic mode, but one-shot mode use planned for later, as part of tickless kernel project. For this moment infrastructure used on i386 and amd64 architectures. Other archs are welcome to follow, while their current operation should not be affected. This patch updates existing drivers (i8254, RTC and LAPIC) for the new order, and adds event timers support into the HPET driver. These drivers have different capabilities: LAPIC - per-CPU timer, supports periodic and one-shot operation, may freeze in C3 state, calibrated on first use, so may be not exactly precise. HPET - depending on hardware can work as per-CPU or global, supports periodic and one-shot operation, usually provides several event timers. i8254 - global, limited to periodic mode, because same hardware used also as time counter. RTC - global, supports only periodic mode, set of frequencies in Hz limited by powers of 2. Depending on hardware capabilities, drivers preferred in following orders, either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC. User may explicitly specify wanted timers via loader tunables or sysctls: kern.eventtimer.timer1 and kern.eventtimer.timer2. If requested driver is unavailable or unoperational, system will try to replace it. If no more timers available or "NONE" specified for second, system will operate using only one timer, multiplying it's frequency by few times and uing respective dividers to honor hz, stathz and profhz values, set during initial setup.
* Only enable kdtrace hook in the LINT on the architectures that implement it.kib2010-06-181-0/+6
|
* Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.mav2010-06-171-12/+23
| | | | | | | | | This information can be very valuable for CPU sleep-time (and respectively idle power consumption) optimization. Add counters for timer-related IPIs. Reviewed by: jhb@ (previous version)
* Restore the machine check register banks on resume. For banks beingjhb2010-06-152-0/+3
| | | | | | | monitored via CMCI, reset the interrupt threshold to 1 on resume. Reviewed by: jkim MFC after: 2 weeks
* Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()mav2010-06-141-1/+1
| | | | | for pre-bound IRQs during boot, submit there LAPIC ID, same as in other places, not CPU ID.
* Check general TSC presence before doing more specific checks and printfs.mav2010-06-121-1/+5
|
* Update several places that iterate over CPUs to use CPU_FOREACH().jhb2010-06-113-12/+7
|
* Relax one of the new assertions in pmap_enter() a little. Specifically,alc2010-06-112-2/+4
| | | | | | allow pmap_enter() to be performed on an unmanaged page that doesn't have VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages. (See, for example, pmap_remove_write().)
* Do not require pos parameter to be zero in MAP_ANONYMOUS mmap requestskan2010-06-101-2/+6
| | | | | | | | | | in Linux emulation layer. Linux seems to only require that pos is page-aligned, but otherwise ignores it. Default FreeBSD mmap parameter checking is too strict to allow some Linux binaries to run. tsMuxeR is one example of such a binary. Discussed with: jhb MFC after: 1 week
* Reduce the scope of the page queues lock and the number ofalc2010-06-102-22/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PG_REFERENCED changes in vm_pageout_object_deactivate_pages(). Simplify this function's inner loop using TAILQ_FOREACH(), and shorten some of its overly long lines. Update a stale comment. Assert that PG_REFERENCED may be cleared only if the object containing the page is locked. Add a comment documenting this. Assert that a caller to vm_page_requeue() holds the page queues lock, and assert that the page is on a page queue. Push down the page queues lock into pmap_ts_referenced() and pmap_page_exists_quick(). (As of now, there are no longer any pmap functions that expect to be called with the page queues lock held.) Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever be passed an unmanaged page. Assert this rather than returning "0" and "FALSE" respectively. ARM: Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH(). Push down the page queues lock inside of pmap_clearbit(), simplifying pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write(). Additionally, this allows for avoiding the acquisition of the page queues lock in some cases. PowerPC/AIM: moea*_page_exits_quick() and moea*_page_wired_mappings() will never be called before pmap initialization is complete. Therefore, the check for moea_initialized can be eliminated. Push down the page queues lock inside of moea*_clear_bit(), simplifying moea*_clear_modify() and moea*_clear_reference(). The last parameter to moea*_clear_bit() is never used. Eliminate it. PowerPC/BookE: Simplify mmu_booke_page_exists_quick()'s control flow. Reviewed by: kib@
* Move the MD support for PCI message signalled interrupts to the x86 treejhb2010-06-081-602/+0
| | | | as it is identical for i386 and amd64.
* Move the machine check support code to the x86 tree since it is identicaljhb2010-06-081-884/+0
| | | | | | on i386 and amd64. Requested by: alc
* Move the I/O APIC code to the x86 tree since it is identical on i386 andjhb2010-06-081-922/+0
| | | | amd64.
* - Use a bit more care when moving I/O APIC interrupts between CPUs. Maskjhb2010-06-081-4/+22
| | | | | | | | | | | the interrupt followed by a brief delay if it is not currently masked before moving the interrupt. - Move the icu_lock out of ioapic_program_intpin() and into callers. This closes a race where ioapic_program_intpin() could use a stale value of the masked state to compute the masked bit in the register. Reviewed by: mav MFC after: 2 weeks
* Introduce the x86 kernel interfaces to allow kernel code to usekib2010-06-059-47/+230
| | | | | | | | | | | | | | | | FPU/SSE hardware. Caller should provide a save area that is chained into the stack of the areas; pcb save_area for usermode FPU state is on top. The pcb now contains a pointer to the current FPU saved area, used during FPUDNA handling and context switches. There is also a facility to allow the kernel thread to use pcb save_area. Change the dreaded warnings "npxdna in kernel mode!" into the panics when FPU usage is not registered. KPI discussed with: fabient Tested by: pho, fabient Hardware provided by: Sentex Communications MFC after: 1 month
* In the unlikely event that pmap_ts_referenced() demoted five superpagealc2010-06-031-1/+2
| | | | | | | mappings to the same underlying physical page, the calling thread would be left forever pinned to the same processor. MFC after: 3 days
* MFamd64: Add a new macro PCPU_XEN_FIELDS to hold XEN-specific per-CPUjhb2010-06-021-28/+11
| | | | | | | | | | fields that is always included in PCPU_MD_FIELDS. The macro is empty for non-XEN kernels. This avoids duplicating non-XEN per-CPU fields in two places. While here, remove several unused fields from the XEN-specific structure. Reviewed by: kmacy, gibbs MFC after: 1 month
* Eliminate a stale comment.alc2010-05-312-8/+0
|
* Simplify the inner loop of pmap_collect(): While iterating over the page'salc2010-05-302-8/+5
| | | | | pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.
* Merge various changes from i386/i386/pmap.c:alc2010-05-301-72/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions. Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all(). r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().
* When I pushed down the page queues lock into pmap_is_modified(), I createdalc2010-05-291-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().
* Defer initializing machine checks for the boot CPU until the local APIC isjhb2010-05-282-1/+13
| | | | | | fully configured. MFC after: 1 month
* Defer freeing any page table pages in pmap_remove_all() until after thealc2010-05-281-2/+2
| | | | | page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all().
* Clarify a potential issue in get_fpcontext() use.kib2010-05-271-0/+14
| | | | MFC after: 1 week
* Push down page queues lock acquisition in pmap_enter_object() andalc2010-05-262-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
* Only enable CMCI on i386 if 'device apic' is enabled in the kernel sincejhb2010-05-251-0/+25
| | | | it requires the local APIC to work.
* Add support for corrected machine check interrupts. CMCI is a new localjhb2010-05-247-29/+261
| | | | | | | | | | | | | | | | APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-242-105/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* - Implement MI helper functions, dividing one or two timer interrupts withmav2010-05-243-8/+2
| | | | | | | | arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
* Reorganize syscall entry and leave handling.kib2010-05-235-159/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
* Unify local_apic.c for x86 archs,mav2010-05-231-1427/+0
|
* Rename an argument from "exp" to "expect" since the former makes FlexeLintphk2010-05-201-8/+8
| | | | | | uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.
* Add constants for the optional EOI suppression support in local APICs andjhb2010-05-191-0/+3
| | | | EOI registers in I/O APICs.
* On entry to pmap_enter(), assert that the page is busy. While I'malc2010-05-162-9/+30
| | | | | | | | | | | | | | | | | | | | here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
* Apply a patch that has been lingering in my inbox for far too long:phk2010-05-151-4/+13
| | | | | | | | | | | | | | | | | | | | On a soekris Net5501, if you do a watchdog -t 16, followed by a watchdog -t 0 to disable the watchdog, and then after some time (16s) re-enable the watchdog the box reboots immediatly. This prevents also to stop and restart watchdogd(8). This is because when you stop the watchdog, the timer is not stoped, only the hard reset is disabled. So when the timer has elapsed, the C2 event of the timer is set. But when the hard reset is re-enabled, the event is not cleared and the box reboots. The attached patch stops and resets the counter when the watchdog is disabled and do not disable the hard reset of the timer (if the timer has elapsed it's too late). Submitted by: Patrick Lamaizière
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-082-24/+28
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
OpenPOWER on IntegriCloud