summaryrefslogtreecommitdiffstats
path: root/sys/i386
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate a stale comment.alc2010-05-312-8/+0
|
* Simplify the inner loop of pmap_collect(): While iterating over the page'salc2010-05-302-8/+5
| | | | | pv list, there is no point in checking whether or not the pv list is empty. Instead, wait until the loop completes.
* Merge various changes from i386/i386/pmap.c:alc2010-05-301-72/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remaining, unmerged portions of r175404 Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel option DIAGNOSTIC still disables inlining of certain pmap functions.) Eliminate dead code from pmap_enter(). This code implemented an assertion. On i386, an equivalent check is already implemented. However, on amd64, a small change is required to implement an equivalent check. Eliminate \n from a nearby panic string. Use KASSERT() to reimplement pmap_copy()'s two assertions. Merge portions of r177659 To date, we have assumed that the TLB will only set the PG_M bit in a PTE if that PTE has the PG_RW bit set. However, this assumption does not hold on recent processors from Intel. For example, consider a PTE that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE is cached in the TLB and later the PG_RW bit is cleared in the PTE, but the corresponding TLB entry is not (yet) invalidated. Historically, upon a write access using this (stale) TLB entry, the TLB would observe that the PG_RW bit had been cleared and initiate a page fault, aborting the setting of the PG_M bit in the PTE. Now, however, P4- and Core2-family processors will set the PG_M bit before observing that the PG_RW bit is clear and initiating a page fault. In other words, the write does not occur but the PG_M bit is still set. The real impact of this difference is not that great. Specifically, we should no longer assert that any PTE with the PG_M bit set must also have the PG_RW bit set, and we should ignore the state of the PG_M bit unless the PG_RW bit is set. r208609 Defer freeing any page table pages in pmap_remove_all() until after the page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all(). r208645 When I pushed down the page queues lock into pmap_is_modified(), I created an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().
* When I pushed down the page queues lock into pmap_is_modified(), I createdalc2010-05-291-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified() could return FALSE without acquiring the page queues lock because the page is not (currently) writeable, and the caller to pmap_is_modified() might believe that the page's dirty field is clear because it has not seen the effect of the vm_page_dirty() call. When I pushed down the page queues lock into pmap_is_modified(), I overlooked one place where this ordering dependence is violated: pmap_enter(). In a rare situation pmap_enter() can be called to replace a dirty mapping to one page with a mapping to another page. (I say rare because replacements generally occur as a result of a copy-on-write fault, and so the old page is not dirty.) This change delays clearing PG_WRITEABLE until after vm_page_dirty() has been called. Fixing the ordering dependency also makes it easy to introduce a small optimization: When pmap_enter() used to replace a mapping to one page with a mapping to another page, it freed the pv entry for the first mapping and later called the pv entry allocator for the new mapping. Now, pmap_enter() attempts to recycle the old pv entry, saving two calls to the pv entry allocator. There is no point in setting PG_WRITEABLE on unmanaged pages, so don't. Update a comment to reflect this. Tidy up the variable declarations at the start of pmap_enter().
* Defer initializing machine checks for the boot CPU until the local APIC isjhb2010-05-282-1/+13
| | | | | | fully configured. MFC after: 1 month
* Defer freeing any page table pages in pmap_remove_all() until after thealc2010-05-281-2/+2
| | | | | page queues lock is released. This may reduce the amount of time that the page queues lock is held by pmap_remove_all().
* Clarify a potential issue in get_fpcontext() use.kib2010-05-271-0/+14
| | | | MFC after: 1 week
* Push down page queues lock acquisition in pmap_enter_object() andalc2010-05-262-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
* Only enable CMCI on i386 if 'device apic' is enabled in the kernel sincejhb2010-05-251-0/+25
| | | | it requires the local APIC to work.
* Add support for corrected machine check interrupts. CMCI is a new localjhb2010-05-247-29/+261
| | | | | | | | | | | | | | | | APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. Tested by: Sailaja Bangaru sbappana at yahoo com MFC after: 1 month
* Roughly half of a typical pmap_mincore() implementation is machine-alc2010-05-242-105/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | independent code. Move this code into mincore(), and eliminate the page queues lock from pmap_mincore(). Push down the page queues lock into pmap_clear_modify(), pmap_clear_reference(), and pmap_is_modified(). Assert that these functions are never passed an unmanaged page. Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m: Contrary to what the comment says, pmap_mincore() is not simply an optimization. Without a complete pmap_mincore() implementation, mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED because only the pmap can provide this information. Eliminate the page queues lock from vfs_setdirty_locked_object(), vm_pageout_clean(), vm_object_page_collect_flush(), and vm_object_page_clean(). Generally speaking, these are all accesses to the page's dirty field, which are synchronized by the containing vm object's lock. Reduce the scope of the page queues lock in vm_object_madvise() and vm_page_dontneed(). Reviewed by: kib (an earlier version)
* - Implement MI helper functions, dividing one or two timer interrupts withmav2010-05-243-8/+2
| | | | | | | | arbitrary frequencies into hardclock(), statclock() and profclock() calls. Same code with minor variations duplicated several times over the tree for different timer drivers and architectures. - Switch all x86 archs to new functions, simplifying the code and removing extra logic from timer drivers. Other archs are also welcome.
* Reorganize syscall entry and leave handling.kib2010-05-235-159/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend struct sysvec with three new elements: sv_fetch_syscall_args - the method to fetch syscall arguments from usermode into struct syscall_args. The structure is machine-depended (this might be reconsidered after all architectures are converted). sv_set_syscall_retval - the method to set a return value for usermode from the syscall. It is a generalization of cpu_set_syscall_retval(9) to allow ABIs to override the way to set a return value. sv_syscallnames - the table of syscall names. Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding the call to cpu_set_syscall_retval(). The new functions syscallenter(9) and syscallret(9) are provided that use sv_*syscall* pointers and contain the common repeated code from the syscall() implementations for the architecture-specific syscall trap handlers. Syscallenter() fetches arguments, calls syscall implementation from ABI sysent table, and set up return frame. The end of syscall bookkeeping is done by syscallret(). Take advantage of single place for MI syscall handling code and implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the thread is stopped at syscall entry or return point respectively. The EXEC flag augments SCX and notifies debugger that the process address space was changed by one of exec(2)-family syscalls. The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are changed to use syscallenter()/syscallret(). MIPS and arm are not converted and use the mostly unchanged syscall() implementation. Reviewed by: jhb, marcel, marius, nwhitehorn, stas Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc), stas (mips) MFC after: 1 month
* Unify local_apic.c for x86 archs,mav2010-05-231-1427/+0
|
* Rename an argument from "exp" to "expect" since the former makes FlexeLintphk2010-05-201-8/+8
| | | | | | uneasy, in case anybody think it might be exp(3) in libm. This also makes it consistent with other archs.
* Add constants for the optional EOI suppression support in local APICs andjhb2010-05-191-0/+3
| | | | EOI registers in I/O APICs.
* On entry to pmap_enter(), assert that the page is busy. While I'malc2010-05-162-9/+30
| | | | | | | | | | | | | | | | | | | | here, make the style of assertion used by pmap_enter() consistent across all architectures. On entry to pmap_remove_write(), assert that the page is neither unmanaged nor fictitious, since we cannot remove write access to either kind of page. With the push down of the page queues lock, pmap_remove_write() cannot condition its behavior on the state of the PG_WRITEABLE flag if the page is busy. Assert that the object containing the page is locked. This allows us to know that the page will neither become busy nor will PG_WRITEABLE be set on it while pmap_remove_write() is running. Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly do copy-on-write-based zero-copy transmit on unmanaged or fictitious pages, so don't even try. Previously, the call to pmap_remove_write() would have failed silently.
* Apply a patch that has been lingering in my inbox for far too long:phk2010-05-151-4/+13
| | | | | | | | | | | | | | | | | | | | On a soekris Net5501, if you do a watchdog -t 16, followed by a watchdog -t 0 to disable the watchdog, and then after some time (16s) re-enable the watchdog the box reboots immediatly. This prevents also to stop and restart watchdogd(8). This is because when you stop the watchdog, the timer is not stoped, only the hard reset is disabled. So when the timer has elapsed, the C2 event of the timer is set. But when the hard reset is re-enabled, the event is not cleared and the box reboots. The attached patch stops and resets the counter when the watchdog is disabled and do not disable the hard reset of the timer (if the timer has elapsed it's too late). Submitted by: Patrick Lamaizière
* Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), andalc2010-05-082-24/+28
| | | | | | | | | | | vm_page_try_to_free(). Consequently, push down the page queues lock into pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and pmap_remove_write(). Push down the page queues lock into Xen's pmap_page_is_mapped(). (I overlooked the Xen pmap in r207702.) Switch to a per-processor counter for the total number of pages cached.
* Push down the page queues lock inside of vm_page_free_toq() andalc2010-05-061-7/+6
| | | | | | | | | | pmap_page_is_mapped() in preparation for removing page queues locking around calls to vm_page_free(). Setting aside the assertion that calls pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page queues lock just long enough to actually add or remove the page from the paging queues. Update vm_page_unhold() to reflect the above change.
* Add definitions for Intel AESNI CPUID bits and print the capabilitieskib2010-05-052-2/+4
| | | | | | | on boot. Hardware provided by: Sentex Communications MFC after: 1 week
* Switch to our preferred 2-clause BSD license.joel2010-05-052-30/+26
| | | | Approved by: kmacy
* merge 194209 in to the i386/xen pmapkmacy2010-04-301-46/+47
| | | | requested by: alc@
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-303-4/+21
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* - Extract the IODEV_PIO interface from ia64 and make it MI.attilio2010-04-282-37/+24
| | | | | | | | | | | | | | | | In the end, it does help fixing /dev/io usage from multithreaded processes. - On i386 and amd64 the old behaviour is kept but multithreaded processes must use the new interface in order to work well. - Support for the other architectures is greatly improved, where necessary, by the necessity to define very small things now. Manpage update will happen shortly. Sponsored by: Sandvine Incorporated PR: threads/116181 Reviewed by: emaste, marcel MFC after: 3 weeks
* Style: use #define<TAB> instead of #define<SPACE>.kib2010-04-271-1/+1
| | | | | Noted by: bde, pluknet gmail com MFC after: 11 days
* MFi386 r207205alc2010-04-271-13/+7
| | | | | | Clearing a page table entry's accessed bit (PG_A) and setting the page's PG_REFERENCED flag in pmap_protect() can't really be justified, so don't do it.
* Clearing a page table entry's accessed bit (PG_A) and setting thealc2010-04-251-22/+6
| | | | | | | | | | | | | | | page's PG_REFERENCED flag in pmap_protect() can't really be justified. In contrast to pmap_remove() or pmap_remove_all(), the mapping is not being destroyed, so the notion that the page was accessed is not lost. Moreover, clearing the page table entry's accessed bit and setting the page's PG_REFERENCED flag can throw off the page daemon's activity count calculation. Finally, in my tests, I found that 15% of the atomic memory operations being performed by pmap_protect() were only to clear PG_A, and not change protection. This could, by itself, be fixed, but I don't see the point given the above argument. Remove a comment from pmap_protect_pde() that is no longer meaningful after the above change.
* - fix style issues on i386 as wellkmacy2010-04-241-16/+16
| | | | requested by: alc@
* Resurrect pmap_is_referenced() and use it in mincore(). Essentially,alc2010-04-242-6/+76
| | | | | | | | | | | | | | | | | pmap_ts_referenced() is not always appropriate for checking whether or not pages have been referenced because it clears any reference bits that it encounters. For example, in mincore(), clearing the reference bits has two negative consequences. First, it throws off the activity count calculations performed by the page daemon. Specifically, a page on which mincore() has called pmap_ts_referenced() looks less active to the page daemon than it should. Consequently, the page could be deactivated prematurely by the page daemon. Arguably, this problem could be fixed by having mincore() duplicate the activity count calculation on the page. However, there is a second problem for which that is not a solution. In order to clear a reference on a 4KB page, it may be necessary to demote a 2/4MB page mapping. Thus, a mincore() by one process can have the side effect of demoting a superpage mapping within another process!
* Move the constants specifying the size of struct kinfo_proc intokib2010-04-241-0/+2
| | | | | | | | | | machine-specific header files. Add KINFO_PROC32_SIZE for struct kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add CTASSERT for the size of struct kinfo_proc32. Submitted by: pluknet Reviewed by: imp, jhb, nwhitehorn MFC after: 2 weeks
* If a conditional jump instruction has the same jt and jf, do not performjkim2010-04-222-10/+31
| | | | the test and jump unconditionally.
* Change USB_DEBUG to #ifdef and allow it to be turned off. Previously this hadthompsa2010-04-222-0/+2
| | | | | | the illusion of a tunable setting but was always turned on regardless. MFC after: 1 week
* Rename the cyclic global variable lapic_cyclic_clock_func to justrpaulo2010-04-201-3/+3
| | | | | cyclic_clock_func. This will make more sense when we start developing non x86 cyclic version.
* Add driver for Silicon Integrated Systems SiS190/191 Fast/Gigabit Ethernet.yongari2010-04-141-0/+1
| | | | | | | | | | | | | | | | | This driver was written by Alexander Pohoyda and greatly enhanced by Nikolay Denev. I don't have these hardwares but this driver was tested by Nikolay Denev and xclin. Because SiS didn't release data sheet for this controller, programming information came from Linux driver and OpenSolaris. Unlike other open source driver for SiS190/191, sge(4) takes full advantage of TX/RX checksum offloading and does not require additional copy operation in RX handler. The controller seems to have advanced offloading features like VLAN hardware tag insertion/stripping, TCP segmentation offload(TSO) as well as jumbo frame support but these features are not available yet. Special thanks to xclin <xclin<> cs dot nctu dot edu dot tw> who sent fix for receiving VLAN oversized frames.
* Change printf() calls to uprintf() for sigreturn() and trap() complaintskib2010-04-132-6/+10
| | | | | | | | | | about inacessible or wrong mcontext, and for dreaded "kernel trap with interrupts disabled" situation. The later is changed when trap is generated from user mode (shall never be ?). Normalize the messages to include both pid and thread name. MFC after: 1 week
* Add EFI boot info fields.rpaulo2010-04-071-0/+7
|
* Switch to our preferred 2-clause BSD license.joel2010-04-071-6/+1
| | | | Approved by: jfv
* - Support for uncore counting events: one fixed PMC with the uncorefabient2010-04-021-1/+10
| | | | | | | | | | | | domain clock, 8 programmable PMC. - Westmere based CPU (Xeon 5600, Corei7 980X) support. - New man pages with events list for core and uncore. - Updated Corei7 events with Intel 253669-033US December 2009 doc. There is some removed events in the documentation, they have been kept in the code but documented in the man page as obsolete. - Offcore response events can be setup with rsp token. Sponsored by: NETASQ
* Add a handler for the local APIC error interrupt. For now it just printsjhb2010-03-293-20/+46
| | | | | | | out the current value of the local APIC error register when the interrupt fires. MFC after: 1 week
* Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.ed2010-03-282-12/+12
| | | | | | | | | | | | | | | A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.
* Correctly handle preemption of pmap_update_pde_invalidate().alc2010-03-271-2/+5
| | | | X-MFC after: r205573
* Simplify pmap_growkernel(), making the i386 version more like the amd64alc2010-03-271-18/+5
| | | | | | version. MFC after: 3 weeks
* A ptrace(2) by one processor may trigger a promotion in the address spacealc2010-03-251-1/+1
| | | | | | | | of another process. Modify pmap_promote_pde() to handle this. (This is not a problem on amd64 due to implementation differences.) Reported by: jh@ MFC after: 1 week
* Change the arguments of exec_setregs() so that it receives a pointernwhitehorn2010-03-252-12/+7
| | | | | | | | to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
* Adapt r204907 and r205402, the amd64 implementation of the workaround foralc2010-03-244-53/+224
| | | | | | | | | | | AMD Family 10h Erratum 383, to i386. Enable machine check exceptions by default, just like r204913 for amd64. Enable superpage promotion only if the processor actually supports large pages, i.e., PG_PS. MFC after: 2 weeks
* Remove unneeded type specifiers from 64-bit constants. The compilerjhb2010-03-221-30/+30
| | | | | | | infers their natural type from the constants' values. Submitted by: bde MFC after: 3 days
* Merge r197455 from amd64:emaste2010-03-221-0/+4
| | | | | | | Add a backtrace to the "fpudna in kernel mode!" case, to help track down where this comes from. Reviewed by: bde
* Back out revision 205307.delphij2010-03-191-0/+2
| | | | | | | For the record: CPU_ENABLE_SSE enables some code that dynamically enables SSE support but not necessarily enforce execution of SSE instructions.
* pmap amd64/i386: fix a typo in a commentavg2010-03-191-1/+1
| | | | MFC after: 3 days
OpenPOWER on IntegriCloud